[Sussex] SkyPickle stuff...

Geoffrey Teale tealeg at member.fsf.org
Wed Dec 29 22:22:13 UTC 2004


Alan F wrote:
> Is there any particular reason you chose to write it in C? Or just
> preference?

I can see why you ask the question.  To do a job like this normally I 
would use Python, Ruby or even Bash.  Those tools make jobs like this 
very easy to write and maintain.

In the end there are several reasons why I wrote it in C.

The task in hand is one I commonly do at the command line and at work we 
used Epylog (a more complex log summarising tool written in Python) for 
a while.  When I do something manually more than a couple of times I end 
up writing program to do it for me.

A Bash script ends up just being very ugly and needing a lot of 
parameters to cover even the basic set of options that Grumbleweed 
supports.

Epylog on the other hand is a highly configurable tool, but  my 
experience of it is that it uses a massive amount of system resource in 
a non-trivial setup and it tries to be too clever in terms of 
remembering what it has and has not parsed.  We eventually abandoned 
Epylog because it was unstable, slow and prone to mail-bombing us with 
20MB emails in the setup we used it in.  That's not a criticism of 
Epylog - I think it's an excellent tool and very useful.  It's certainly 
far more ambitious in scope than Grumbleweed.

With these points in mind I decided I needed to write a very simple tool 
  that would do this job in a manner efficient enough to be used in a 
system with a lot of data and little spare resource.

OK, at that point I reasoned thus:

- C and C++ are at once my most treasured (and hard earned) skills and 
the ones I have least opportunity to practice in the day to day.

- I find C a pleasant language to write in (something aesthetic that I 
can't explain).

- The GNU "Languages of Choice" are C, C++, LISP (GUILE).

- Not all systems have Bash, Ruby, Python or Perl installed.  C code 
compiled on the correct platform and using GLibC runs anywhere that 
GLibC is installed (that's every Linux, Hurd, and Darwin/Mac OS X box 
out there and many, many other platforms).

So I came quite naturally to write it in C.

> http://loonix.net/files/mailstats.pl is my first attempt at writing a
> log reporting tool. As simple as it is, it produces some pretty useful
> statistics. My thinking was that the best way to get flexible reports,
> low processing overhead and smallest amount of code was to run the
> script as a cronjob to parse the daily log which had just been
> rotated, then put it in a postgresql database and let the database do
> any aggregates when needed. It's far more specialised than your
> creation, only of use to people using amavisd-new and spamassassin.
<snip>

That approach is very flexible I don't agree about the low overhead 
aspect of it though.  A database like postgreSQL is a hell of a thing to 
be running on a system, and while they are very useful and flexible 
relational databases are no where near the most efficient way or writing 
a retrieving data.  With more configurable data gathering your system 
would be fine for 99% of systems I can think of and much more useful as 
a general purpose reporting system than Grumbleweed.

Going back to the point about relational database systems: as a rule 
(one that has noteable exceptions) the more general purpose you make 
your code the less efficient it is.  Your average SQL Query is very 
inefficient - that's because SQL was designed to be used at a command 
line by a relatively unskilled user.  SQL is really powerful and 
flexible, but it requires a lot of processing to do its work.  Stored 
procedure and indexes make a lot of difference of course, but even so 
for specialised (fixed) data retrieval and processing any halfway decent 
programmer could do the same job much more quickly working directly with 
files and structures in memory.


> I've been meaning to write a fancy CGI script to generate graphs and
> stuff, but I've been too lazy.

:-)

The great thing about Open Source (and making your code available) is 
that if someone really needs it they'll write it for you.

> The documentation looks perfectly comprehensive, my idea of
> "documentation" never amounts to more than a few comments at the top
> of a script. This is mainly because anything I write is so simple and
> short, and usually of little use to anyone else anyway. :-)

It's a lot of work to really do a "good" job.  I think that publically 
releasing your work on freshmeat is a good way of making sure you do it. 
  Never think that a project is too small to be useful - if it's useful 
to you it's almost definitely useful to someone else.  Putting in the 
extra few hours of work is the foudation of this entire movement.

--
Geoff Teale
Free Software Foundation <tealeg at member.fsf.org>




More information about the Sussex mailing list