[Sussex] SkyPickle stuff...
Geoffrey Teale
tealeg at member.fsf.org
Wed Dec 29 22:22:13 UTC 2004
Alan F wrote:
> Is there any particular reason you chose to write it in C? Or just
> preference?
I can see why you ask the question. To do a job like this normally I
would use Python, Ruby or even Bash. Those tools make jobs like this
very easy to write and maintain.
In the end there are several reasons why I wrote it in C.
The task in hand is one I commonly do at the command line and at work we
used Epylog (a more complex log summarising tool written in Python) for
a while. When I do something manually more than a couple of times I end
up writing program to do it for me.
A Bash script ends up just being very ugly and needing a lot of
parameters to cover even the basic set of options that Grumbleweed
supports.
Epylog on the other hand is a highly configurable tool, but my
experience of it is that it uses a massive amount of system resource in
a non-trivial setup and it tries to be too clever in terms of
remembering what it has and has not parsed. We eventually abandoned
Epylog because it was unstable, slow and prone to mail-bombing us with
20MB emails in the setup we used it in. That's not a criticism of
Epylog - I think it's an excellent tool and very useful. It's certainly
far more ambitious in scope than Grumbleweed.
With these points in mind I decided I needed to write a very simple tool
that would do this job in a manner efficient enough to be used in a
system with a lot of data and little spare resource.
OK, at that point I reasoned thus:
- C and C++ are at once my most treasured (and hard earned) skills and
the ones I have least opportunity to practice in the day to day.
- I find C a pleasant language to write in (something aesthetic that I
can't explain).
- The GNU "Languages of Choice" are C, C++, LISP (GUILE).
- Not all systems have Bash, Ruby, Python or Perl installed. C code
compiled on the correct platform and using GLibC runs anywhere that
GLibC is installed (that's every Linux, Hurd, and Darwin/Mac OS X box
out there and many, many other platforms).
So I came quite naturally to write it in C.
> http://loonix.net/files/mailstats.pl is my first attempt at writing a
> log reporting tool. As simple as it is, it produces some pretty useful
> statistics. My thinking was that the best way to get flexible reports,
> low processing overhead and smallest amount of code was to run the
> script as a cronjob to parse the daily log which had just been
> rotated, then put it in a postgresql database and let the database do
> any aggregates when needed. It's far more specialised than your
> creation, only of use to people using amavisd-new and spamassassin.
<snip>
That approach is very flexible I don't agree about the low overhead
aspect of it though. A database like postgreSQL is a hell of a thing to
be running on a system, and while they are very useful and flexible
relational databases are no where near the most efficient way or writing
a retrieving data. With more configurable data gathering your system
would be fine for 99% of systems I can think of and much more useful as
a general purpose reporting system than Grumbleweed.
Going back to the point about relational database systems: as a rule
(one that has noteable exceptions) the more general purpose you make
your code the less efficient it is. Your average SQL Query is very
inefficient - that's because SQL was designed to be used at a command
line by a relatively unskilled user. SQL is really powerful and
flexible, but it requires a lot of processing to do its work. Stored
procedure and indexes make a lot of difference of course, but even so
for specialised (fixed) data retrieval and processing any halfway decent
programmer could do the same job much more quickly working directly with
files and structures in memory.
> I've been meaning to write a fancy CGI script to generate graphs and
> stuff, but I've been too lazy.
:-)
The great thing about Open Source (and making your code available) is
that if someone really needs it they'll write it for you.
> The documentation looks perfectly comprehensive, my idea of
> "documentation" never amounts to more than a few comments at the top
> of a script. This is mainly because anything I write is so simple and
> short, and usually of little use to anyone else anyway. :-)
It's a lot of work to really do a "good" job. I think that publically
releasing your work on freshmeat is a good way of making sure you do it.
Never think that a project is too small to be useful - if it's useful
to you it's almost definitely useful to someone else. Putting in the
extra few hours of work is the foudation of this entire movement.
--
Geoff Teale
Free Software Foundation <tealeg at member.fsf.org>
More information about the Sussex
mailing list