[Gllug] Website developement

Nix nix at esperi.demon.co.uk
Sun Jul 15 19:35:53 UTC 2001


On 15 Jul 2001, Stig Brautaset uttered the following:
> Nix <nix at esperi.demon.co.uk> writes:
>> This is one that they are not supposed to solve. The C preprocessor
>> can only sensibly handle C; any attempt to make it handle anything
>> else (like Makefiles, hello imake, or X resources files) is a
>> mistake.
>> 
>> You will find oddities happening, like the string `i586' collapsing
>> to `1' in the expanded page;
> 
> Yes, I am aware of that.
> 
> I do not often use defined C/C++ keywords on my webpage, so this
> problem is easily avoided by putting "#undef linux" (or whatever
> keyword you need to write) somewhere in the file included. Simple.

You do this, and you haven't realised that this is saying something?
(Like `cpp is the wrong tool'?)

(More problems: digraphs, trigraphs, /* sequences and the di- and
trigraph equivalents to them, and backslashes will also fubar your
preprocessing assumption of plain text. M4 has none of these problems.)

FWIW, the set of potentially #defined symbols is not bounded, so in
order to work safely you must #undef every single word you intend to
write before you use it. *All* of them. If you don't do that, you'll
have a webpage preparation system that could be broken by moving it from
one system to another, or by upgrading the compiler.

Great move.

> And yes, other problems like having to use " instead of " in the
> source file, but you are supposed to do that anyway, so I really do
> not see it as a problem. 

You never need to use # for anything, either?

>> it Just Won't Work like you want it to.
> 
> Speak for yourself; it does *exactly* what I need. 

In a thoroughly fragile fashion.

>> (In fact, as of GCC-3.0, it won't work at all; `cpp -traditional' will
>> work better than nothing --- that was kept working for the sake of
>> imake.)
> 
> I have gcc 2.95.4 (how it relates to 3.0 I don't know, but it can't be
> *that* far off)

(No, you don't; GCC-2.95.4 isn't released yet. You probably have
something from the head of the GCC-2.95 branch in CVS.)

Zack Weinberg and Neil Booth rewrote the preprocessor from scratch in
GCC-3; the result is smaller, faster, far more standards-compliant (a
tokenizing preprocessor, at last!) --- and will sooner or later
(probably in GCC-3.1) be fused with the compiler the way that the
language-dependent frontends already are, and the `cpp' program will
disappear.

There will still, for a time, be a `cpp', but it will be there solely
for imake and similar broken things; that preprocessor (`tradcpp') dates
from 1989 and is not being maintained. And, if imake can be corrected
(hah!) even that will go away. Its use is strongly recommended against.

>                 and that works like a charm.

Querulous and fragile, with unknown function and who knows what snakes
lurking under the surface?

If you are determined to use cpp to process things other than C, very
well; it's your funeral. It's crazy, but you're the only person it'll
hurt.

>> Consider that transformation of a source file into tokens happens
>> *before* preprocessing; the preprocessor operates on a stream of C
>> tokens, *not* on text.
> 
> Last time I checked, C is written using ascii characters; the same as
> english text.

This is not correct. C is written using some encoding, which may or may
not be ASCII. (GCC-3.1-to-be already supports UTF-8 and may well support
other Unicode encodings.) That encoding is lexed to tokens in the C
language *before* *macro* *expansion*.

If preprocessing happened before tokenization (which it does not;
preprocessing is compilation phase 4, while tokenization is phase 3; see
C99, s5.1.1.2), then this code would work:

#include <stdio.h>

#define LESS_THAN <

int main (void)
 {
  if (3 LESS_THAN= 4)
   puts ("a");
  else
   puts ("b");
 }

Instead, that gives a parse error, because the tokenization of <= into a
`less-than-or-equal-to operator' token happens *before* macro expansion,
so the compiler's parser is faced with two operators < and = in
sequence, which is not valid C..

The right way to do that is

#include <stdio.h>

#define COMPOSE(a,b) a##b
#define LESS_THAN <

int main (void)
 {
  if (3 COMPOSE(<,=) 4)
   puts ("a");
  else
   puts ("b");
 }

which merges the two tokens `<' and `=' together into one,
`<='. GCC-2.95.x and below get this right only in a subset of cases,
GCC-3.0, as far as we know, gets it right all the time.

>> If you want a generalized macro expander, use M4. It has many, many
>> advantages over cpp for this sort of thing;
> 
> I agree that it might be a better solution for this task, but then I
> had to learn yet another tool.

Unlike CPP, M4 doesn't interfere with your text in any way you don't
tell it to; so you need to know one word, `define'. That is, to define a
macro,

define(`to',`from')
from

emits `to'. To write an open quote or a close quote, you quote them:

``', or `''.

That's all you *need* --- although, of course, there is lots more.

(Oh, and you'll probably want `include', too; `sinclude' is like
`include', but nonexistence of the file is not an error.)

>> Using make is fine. Using cpp is a horrible mistake.
> 
> It works fine for me; it simplifies the making of my webpage, and htat
> is really all that I ask. In addition I get some experience with make
> and gcc, which I consider a plus.

If you think that the preprocessor's behaviour on plain text is a good
guide to its behaviour on C code, you are mistaken (because of the
tokenization).

(Experience with make you are getting. Experience with GCC, no, not
really, just experience of the GCC driver program, which is being
rewritten even now, and of the old-style preprocessor, which no longer
exists.)

-- 
`I'm not sure whether libtool is an existence proof that you _can_
 write a shell script that handles its arguments correctly, or a
 demonstration that you may try but you are doomed to failure.'
                                                       -- Zack Weinberg


-- 
Gllug mailing list  -  Gllug at linux.co.uk
http://list.ftech.net/mailman/listinfo/gllug




More information about the GLLUG mailing list