[Gllug] socket buffer overrun

Tue Oct 18 23:05:16 UTC 2005

>>> On Tue, 18 Oct 2005 13:51:59 +0100, Ben Fitzgerald
>>> <ben_m_f at yahoo.co.uk> said:

ben_m_f> Hi, I'm looking into a problem where data transfer
ben_m_f> between two servers is slow.

And how long is that piece of string I have over there? :-)

But... How high is CPU load? Have you got a PCI-64 card and
slot? What makes you think that your servers can process several
dozen MiB/s of TCP traffic? Are you aware of the existence of
TCP/IP accelerators? What protocols are you running on it and
doing what? What's the latency between the two servers?  What
kind of other traffic they do? You later say you got kernel
2.4.21, but from which distribution? Why not 2.6.x?

However thanks for your inner confidence that people willing to
help you are psychic. :-)

ben_m_f> Does the following mean that the receive buffer is too
ben_m_f> small? During data transfer the following increments:
ben_m_f> [root at myhost root]# netstat -ants | grep "buffer over"
ben_m_f>     43539 packets pruned from receive queue because of socket buffer overrun

43539 out of how many? Ideally there should be none, but more
than a small percentage might indicate problems of different
sorts.

ben_m_f> Should rmem_max be larger? This is on a 1000fdx autoneg
ben_m_f> interface with txqueuelen:1000.

Not an optimal level of information, but one of the links below
says that 'txqueuelen:1000' may be a good idea indeed, among many.

ben_m_f> [root at myhost root]# cat /proc/sys/net/ipv4/tcp_rmem 
ben_m_f> 4096    87380   174760

Raising the 'tcp_rmem' should help, as 1GHz can theoretically do
more than 100MiB/s (but see the figures in the links below), and
0.17MiB of buffering is equivalent to 1.7ms of buffering which
is not a lot.

But I would surmise, guessing rather wildly (unfortunately,
unlike "full of crap" Nix, I am not clairvoyant and I feel
awkward making statement of fact as to things that I don't know
:->), that's just part of the issues, unless you are very lucky.

Because there are a lot of tweakables, not just the receive
buffer size.

Fortunately there are quite a few tutorials on TCP (and IP)
performance issues on 1GHz ''Ethernet''... For example they
mention the effect of jumbo frames, interrupt coalescing, PCI
bus speed, and so on, quite relevant to performance at such high
data rates, and who knows whether your servers can do them and
are configured accordingly.

Astonishingly :-) using the obvious keywords for a web search
returns quite a number of seemingly relevant results (some of
them recent, some older), for example:

  http://www.justfuckinggoogleit.com/
  http://www-didc.lbl.gov/TCP-tuning/linux.html
  http://datatag.web.cern.ch/datatag/howto/tcp.html
  http://www.dssnetworks.com/v3/FAQs.asp
  http://www.internet2.edu/~shalunov/gigatcp/

  http://www.hep.man.ac.uk/u/rich/net/nic/GE_FGCS_v18.doc
  http://www.scl.ameslab.gov/Publications/Gigabit/tr5126.html
  http://www.syskonnect.com/syskonnect/performance/gig-over-copper.htm
  http://www.uninett.no/tcp-revisited/rapport/

[ ... ]

-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug