Introduction to Linux Kernel Parameters

Last updated: 2020-12-30 17:15:07

    Tencent Cloud provides the Linux public images with default configurations, but we recommend you separately configure sysctl to adapt to your specific business. This document describes the default and optimal configurations of Tencent Cloud Linux public images and helps you manually tune them as needed.

    Note:

    • The parameters expressed as “-” under Initial Configuration use the official image’s default configurations.
    • The sysctl -w command only makes the configurations take effect temporarily, while parameters wrote to /etc/sysctl.conf take effect permanently.

    ParameterDescriptionInitial Configuration
    net.ipv4.tcp_tw_recycle This parameter is used to quickly recycle the TIME_WAIT connection. If enabled, kernel will check the packet timestamp.
    We do not recommend enabling this parameter, because packet loss may occur when the timestamp is not monotonically increasing. This parameter is disused in later kernel versions.
    0
    net.core.somaxconn This parameter is used to define the ESTABLISH state at the end of the three-way handshake when there is no ACCEPT queue. A longer ACCEPT queue indicates low processing rate of the client, or a burst of new connections in a short time. Setting net.core.somaxconn too low may cause packet loss because the new SYN connection will be discarded when the server receives SYN packet and the somaxconn table is full. Setting it too high is only necessary for high concurrence service, but the latency may increase. 128
    net.ipv4.tcp_max_syn_backlog This parameter specifies the maximal number of connections in SYN_RECV queue, which was once used to defend the common synflood attacks. However, if tcp_syncookies=1, connections in SYN_RECV queue will exceed the upper limit. -
    net.ipv4.tcp_syncookies This parameter is used to enable SYN Cookies, which prevents some SYN attacks. If enabled, the connections can still be established when the SYN queue is overflowed. However, SHA1 will be used to verify Cookies, which theoretically increases the CPU utilization. 1
    net.core.rmem_default
    net.core.rmem_max
    net.ipv4.tcp_mem
    net.ipv4.tcp_rmem
    These parameters are used to configure the cache size of received data. Setting this too high may waste memory resources, while setting it too low may cause packet loss. You can tune them according to the concurrence and throughput of your business.
    • rmem_default: the theoretically optimal configuration is equal to the value of bandwidth divided by RTT, which will overwrite the configurations of tcp_rmem and tcp_rmem
    • rmem_max: approximately five times of rmem_default
    • tcp_mem: total TCP memory consumed, which is automatically set to to 3/32, 1/8 or 3/16 of the CVM’s available memory. The parameters tcp_mem and rmem_default also determine the maximum number of concurrent connections.
    rmem_default
    =655360

    rmem_max
    =3276800
    net.core.wmem_default
    net.core.wmem_max
    net.ipv4.tcp_wmem
    These parameters are used to configure the data transmission cache. Data sending on Tencent Cloud usually does not have bottlenecks, so these configurations are optional. -
    net.ipv4.tcp_keepalive_intvl
    net.ipv4.tcp_keepalive_probes
    net.ipv4.tcp_keepalive_time
    These parameters are relevant to TCP Keepalive, which default to 75/9/7200. The default settings mean that the kernel will initiate detection when a TCP connection is idle for 7,200 seconds, and will send RST after 9 failed detections (each for 75 seconds). These values are too high for a server. You can adjust them to 30/3/1800 as needed. -
    net.ipv4.ip_local_port_range This parameter is used to configure the available port range, which can be adjusted as needed. -
    tcp_tw_reuse This parameter is used to reuse a socket in TIME-WAIT state for new TCP connections. This help you quickly restart links that use fixed ports, but may involve risks in the NAT-based network. Later kernel versions support values 0, 1, and 2 and configure it to 2. -
    net.ipv4.ip_forward
    net.ipv6.conf.all.forwarding
    This parameter is used to specify the IP forwarding. You can configure it to 1 in the Docker’s route forwarding scenario. 0
    net.ipv4.conf.default.rp_filter This parameter is used to specify the reverse path validation rule of ENI on received data packets. Valid values include 0, 1 (recommended by RFC3704), and 2. The recommended configuration is a strict mode that can prevent DDoS attacks and IP spoofing acts. -
    net.ipv4.conf.default.accept_source_route This parameter is used to specify whether to accept IP packets containing source routes, which is not allowed by default, as recommended on the CentOS website. 0
    net.ipv4.conf.all.promote_secondaries
    net.ipv4.conf.default.promote_secondaries
    This parameter is used to specify whether a secondary IP address will become a primary IP after the original primary IP address is deleted. 1
    net.ipv6.neigh.default.gc_thresh3
    net.ipv4.neigh.default.gc_thresh3
    This parameter is used to define the maximum records stored in the ARP cache. The garbage collector immediately starts once the stored records exceed the set value. 4096

    ParameterDescriptionInitial Configuration
    vm.vfs_cache_pressure This controls the tendency of the kernel to reclaim the memory. At the default value of 100, the kernel will attempt to reclaim dentries back to memory. The curl-based services usually accumulate dentries, which may use up all the free memory and cause OOM or kernel bug. We configure it to 250 to balance the reclaiming frequency and performance, which is tunable. 250
    vm.min_free_kbytes This parameter is used to force the Linux MEM to keep a minimum number of kilobytes free memory for use by kernel threads. The value is automatically calculated according to the free physical memory (MEM) at startup by: 4*sqrt (MEM). When the server receives microbursts of packets, your server may become subtly broken and cause OOM. On a high-configuration server, we recommend configuring vm.min_free_kbytes at about 1% of the total memory by default. -
    kernel.printk This specifies the level for kernel’s printk printing function. The default configuration is five or above. 5 4 1 7
    kernel.numa_balancing This indicates that kernel can automatically move processes to the corresponding NUMA node, but it is actually ineffective and affects performance. You can try to enable it in the Redis use cases. 0
    kernel.shmall
    kernel.shmmax
    • shmmax: defines the maximum size of a single shared memory segment (in bytes) a Linux process can allocate.
    • shmall: defines system-wide total amount of shared memory pages.
    kernel.shmmax
    =68719476736

    kernel.shmall
    =4294967296

    ParameterDescriptionInitial Configuration
    fs.file-max
    fs.nr_open
    They denote the maximum number of file-handles that the Linux kernel or a process can allocate, respectively.
    • file-max: automatically configures to approximate 100,000/GB when OS starts.
    • nr_open: sets to the fixed value of 1048576, which limits the maximum open file handles in a user-mode environment. Generally, keep this value unchanged. To modify the maximum open file handles, configure the ulimit -n parameter in the /etc/security/limits.conf configuration file.
    ulimit -n=100001
    fs.nr_open=1048576
    kernel.pid_max This specifies maximum processes in a system. The official image uses the default value of 32768, which can be adjusted as needed. -
    kernel.core_uses_pid This determines whether the generated coredump filename will contain .PID. 1
    kernel.sysrq This enables you to operate on /proc/sysrq-trigger later. 1
    kernel.msgmnb
    kernel.msgmax
    They defines the maximum size in bytes of a single message queue and the maximum allowable size in bytes of any single message in a message queue, respectively 65536
    kernel.softlockup_panic This controls whether the kernel will panic when a soft lockup is detected. If enabled, a vmcore will be generated based on the kdump configuration, which can be used to analyze the cause of soft lockup. -

    ParameterDescriptionInitial Configuration
    vm.dirty_background_bytes
    vm.dirty_background_ratio
    vm.dirty_bytes
    vm.dirty_expire_centisecs
    vm.dirty_ratio
    vm.dirty_writeback_centisecs
    These parameters are mainly used to configure the policy for IO being written back to the disk.
    • dirty_background_bytes/dirty_bytes and dirty_background_ratio/dirty_ratio refer to the amount and percentage of system memory that can be filled with “dirty” pages, respectively. In general, the ratio will be specified.
    • dirty_background_ratio: refers to a percentage of dirty pages in the system memory (10% by default) at which the background kernel flush processes will start writing back to the disk.
    • dirty_ratio: refers to the absolute maximum amount of system memory that can be filled with dirty pages before everything must get committed to disk. When the system gets to this point, all new I/O blocks until dirty pages have been written to disk, causing long I/O pauses. The system will first get to the `vm.dirty_background_ratio` condition at which the flush processes will start asynchronous writeback, and applications continue writing. When the system gets to the specified value of ` vm.dirty_ratio`, OS will handle dirty pages synchronously, blocking applications.
    • vm.dirty_expire_centisecs: specifies how long dirty page can be in cache before it needs to be written. It is expressed in 100'ths of a second. Data which has been dirty in-memory for longer than this interval will be written out next time a flush process wakes up.
    • vm.dirty_writeback_centisecs: specifies how often kernel flush processes wake up. It is expressed in 100'ths of a second.
    -