[Nottingham] Network TCP Tuning for Fast Links

Sun May 27 23:48:37 BST 2007

Folks,

>From a little reading around for why you can't get 1 Gbit/s down a Gbit
network and...

Using "fast" links, or even for internal networks, there
can be some performance gains from tuning system TCP settings.

A few links I've just scanned through are:
http://dsd.lbl.gov/TCP-tuning/background.html
http://dsd.lbl.gov/TCP-tuning/linux.html
http://proj.sunet.se/E2E/tcptune.html
http://www.onlamp.com/pub/a/onlamp/2005/11/17/tcp_tuning.html?page=1
http://www.psc.edu/networking/projects/tcptune/
http://www.acc.umu.se/~maswan/linux-netperf.txt
http://datatag.web.cern.ch/datatag/howto/tcp.html
http://www.aarnet.edu.au/engineering/networkdesign/mtu/local.html
http://www.hep.ucl.ac.uk/~ytl/tcpip/linux/txqueuelen/

And for bonding multiple links/interfaces:
http://www.linux-corner.info/bonding.html
http://www.devco.net/archives/2004/11/26/linux_ethernet_bonding.php

The short story is that you can greatly improve your data transfer rate
over long internet links by increasing the system TCP tx and rx
window/buffer values. Note also the comments in
http://www.onlamp.com/pub/a/onlamp/2005/11/17/tcp_tuning.html?page=1
about software set buffer sizes.

For internal networks fully under your control, you can greatly increase
transfer rates and reduce CPU overheads by using "Jumbo packets" with an
MTU of 9000. (Most new switches should support that. Check further
before trying to go any larger.)

Other comments are in this example /etc/sysctl.conf (for Linux 2.6.xx):

# Run "sysctl -p" to effect any changes made here
#
# TCP tuning
# See:
# http://www.onlamp.com/pub/a/onlamp/2005/11/17/tcp_tuning.html?page=1
#
# optimal TCP buffer size for a given network link is double the value
for delay times bandwidth:
# buffer size = 2 * delay * bandwidth
# For example, assume a 100Mbits/s link between California and the
United Kingdom, an RTT of 150ms. The optimal TCP buffer size for this
link is 1.9MBytes
#
# increase TCP maximum buffer size
# Example for 16 MBytes
#net.core.rmem_max = 16777216
#net.core.wmem_max = 16777216

# For a 10Mbits/s link and worst case is Australia at 350ms RTT, so
1MByte is more than enough
# Linux 2.6.17 (later?) defaults to 4194304 max, so match that instead...
net.core.rmem_max = 4194304
net.core.wmem_max = 4194304

# increase Linux autotuning TCP buffer limits
# min, default, and maximum number of bytes to use
# Example for 16 MBytes
#net.ipv4.tcp_rmem = 4096 87380 16777216
#net.ipv4.tcp_wmem = 4096 65536 16777216

# Scaled for 4MByte:
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 49152 4194304

# Notes:
#
# Defaults:
# net.ipv4.tcp_rmem = 4096        87380   174760
# net.ipv4.tcp_wmem = 4096        16384   131072
# net.ipv4.tcp_mem = 49152        65536   98304
#
# Do not adjust tcp_mem unless you know exactly what you are doing.
# This array (in units of pages) determines how the system balances the
# total network buffer space against all other LOWMEM memory usage. The
# three elements are initialized at boot time to appropriate fractions
# of the available system memory and do not need to be changed.
#
# You do not need to adjust rmem_default or wmem_default (at least not
# for TCP tuning). These are the default buffer sizes for non-TCP sockets
# (e.g. unix domain and UDP sockets).
#
#
# Also use for example:
# /sbin/ifconfig eth2 txqueuelen 2000
#
# The default of 1000 is inadequate for long distance, high throughput
pipes.
# For example, a rtt of 120ms at Gig rates, a txqueuelen of at least
10000 is recommended.
#
# txqueuelen should not be set too large for slow links to avoid
excessive latency,
#
# If you are seeing "TCP: drop open request" for real load (not a DDoS),
# you need to increase tcp_max_syn_backlog (8192 worked much better than
# 1024 on heavy webserver load).
#
# If you see stuff like "swapper: page allocation failure. order:0,
mode:0x20"
# you definately need to increase min_free_kbytes for the virtual memory.
#
#
# All tcp settings listed by
# sysctl -a | fgrep tcp
#
# Run "sysctl -p" to effect any changes made here

Anyone experimented with those values?

One lingering question is:

If I set my internal network to use MTU=9000, will those jumbo packets
get sensibly fragmented down to MTU=1500 for internet connections? (Even
for Windows machines?)

Hope of interest,

Cheers,
Martin

-- 
----------------
Martin Lomas
martin at ml1.co.uk
----------------