[Gllug] OT: Tomcat & Apache mapping problem

Russell Howe rhowe at wiss.co.uk
Mon Mar 28 22:36:19 UTC 2005


On Thu, Mar 24, 2005 at 03:56:34PM +0000, Joel Bernstein wrote:
> On Thu, Mar 24, 2005 at 03:46:12PM +0000, Rich Walker wrote:
> > The -exec option to find is for use in emergencies only. Normally, you
> > want find ... | xargs ... : which will only start 3 processes, rather
> > than 1 per file...
> 
> I still don't understand why this happens. Surely they should be
> executed sequentially?

xargs doesn't necessarily fork one process per argument. It can
aggregate input and pass it as a set of arguments to the process,
thereby only spawning one process per N files (where N is the maximum
number of arguments a program can accept - the max. size of argv[]).

See the -n option to xargs.

e.g. with the following:

$ mktemp -d /tmp/test.XXXXXX
/tmp/test.FljLKZ
$ cd /tmp/test.FljLKZ
$ for num in `seq 1 5`; do touch "File $num"; done
$ ls -l
total 0
-rw-r--r--  1 rhowe rhowe 0 2005-03-28 23:26 File 1
-rw-r--r--  1 rhowe rhowe 0 2005-03-28 23:26 File 2
-rw-r--r--  1 rhowe rhowe 0 2005-03-28 23:26 File 3
-rw-r--r--  1 rhowe rhowe 0 2005-03-28 23:26 File 4
-rw-r--r--  1 rhowe rhowe 0 2005-03-28 23:26 File 5

OK, so we have 5 files.

$ du -hsc File\ 1
0       File 1
0       total
$ du -hsc File\ 2
0       File 2
0       total
$ du -hsc File\ *
0       File 1
0       File 2
0       File 3
0       File 4
0       File 5
0       total

What's so special about that? Well, we can tell from the output how many
arguments were passed to du - each argument is listed, followed by a
'total' line.

$ find . -type f -exec du -hsc '{}' \;
0       ./File 1
0       total
0       ./File 2
0       total
0       ./File 3
0       total
0       ./File 4
0       total
0       ./File 5
0       total

Here, find has spawned a seperate 'du' process for every file. That's 6
processes, find, and 5 du's

$ find . -type f -print0 |xargs -0r du -hsc
0       ./File 1
0       ./File 2
0       ./File 3
0       ./File 4
0       ./File 5
0       total

$ rm -rf . # Keep it tidy, but irritate bash.

Here, all the files were passed as arguments to a single “du” process,
meaning there were only 3 processes executed: find, xargs and du. If we
consider that for this example, the most costly operation was in fact
process startup, not the work the process actually did (as is not
uncommon in shell scripts), then we can see that we've halved the number
of processes we start. More importantly, the number remains constant for
larger input.

The former method would launch an extra du process for every file which
was present - the number of processes launched would be 1 + 2n. With the
latter method, the number of processes launched is either 2 or 3, unless
the input is larger than the number of arguments a process can take, so
the number of processes launched is 2 + (n/MAX_ARGS). If you graph those
two functions, even for relatively small numbers of MAX_ARGS (say, 2!),
you will see that one is preferable to the other, for large values of n.

-- 
Russell Howe       | Why be just another cog in the machine,
rhowe at siksai.co.uk | when you can be the spanner in the works?
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list