[Gllug] [Somewhat OT] POSIX regex in C

Nix nix at esperi.org.uk
Wed Jul 13 11:01:07 UTC 2011


On 9 Jul 2011, Adrian McMenamin stated:

You want a #define up here (_XOPEN_SOURCE, _GNU_SOURCE or something) to
specify what language standard you're conforming to. The default is
'just ISO C', which is unlikely to be what you want.

> #include <regex.h>
>
> struct blocklist* getnextblock(void** lastblock, void** head, char* buf)
> {
> 	printf("LINE:%s", buf);
> 	regex_t reg;
> 	regmatch_t addresses[3];
> 	char pattern[1024] = "^([0-9a-f]+)-([0-9a-f]+)";
> 	int xreg = regcomp(&reg, pattern, REG_EXTENDED|REG_NOSUB);
> 	printf("Regcomp returns %d\n", xreg);
> 	int match = regexec(&reg, buf, (size_t)3, addresses, 0);

POSIX states:

> If _nmatch_ is 0 or REG_NOSUB was set in the cflags argument to reg‐
> comp(), then regexec() shall ignore the _pmatch_ argument.

(nmatch == arg 3, pmatch == arg 4).

So two of your arguments are useless.

> 	regfree(&reg);
> 	return *lastblock;
> }

I hope this is C99 :) you have assignments all over the place.

> This produces bizarre behaviour. As it is above xreg reports 0 - a success
> - but the code fails to match -

What does 'the code fails to match' mean? regexec() returns nonzero?

>                                 more than that the printf line at the top
> reports buf is empty -

Perhaps it is. You haven't shown us enough program to let us validate
this for ourselves. The code above definitely produces undefined
behaviour in the case where regcomp() fails: in that case, you must not
call regexec() nor regfree() on the regex_t passed to the failing
regcomp().

>                        however if I comment out (the regcomp line alone is
> enough to break this - the rest presumably has the same issue):
>
> 	int xreg = regcomp(&reg, pattern, REG_EXTENDED|REG_NOSUB);
> 	printf("Regcomp returns %d\n", xreg);
> 	int match = regexec(&reg, buf, (size_t)3, addresses, 0);
> 	regfree(&reg);
>
> the printf reports the (correct) contents of buf - how can code executed
> afterwards affect the results of what came before?

The compiler can freely rearrange code as long as it would produce the
same result -- and it is only guaranteed to do *that* as long as you do
not do something undefined. You don't seem to have done anything
undefined here: however as you have only shown us a single function
rather than a compilable program, we can only guess. It is quite
possible for undefined behaviour in a function's caller to break it.

Can you reproduce this behaviour with a simple testcase?

>                                                  Very odd and makes me
> think it is a build problem. Is there some special library I should be
> linking against? The GNU documentation says all this is in glibc...

Definitely not a build problem, and you should probably be following
POSIX (the 3p manpages, or
<http://pubs.opengroup.org/onlinepubs/9699919799/>, which is the next
POSIX revision after those manpages) rather than the glibc manual, which
is notoriously incomplete, nearly impossible to maintain (because Ulrich
rejects all changes to it unless they come from the people he likes, and
he hardly likes anyone) and fails to document a lot of important stuff.
(If you don't care about portability, you should be following the section
2 and section 3 manpages to pick up the Linux extension to POSIX too.)

reg*() are in POSIX (and have been for fifteen years or more), and are
very widely used, so you can trust that they work. Something else is
wrong, but your example does not give us enough information to be sure
what it is.

-- 
NULL && (void)
--
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug


More information about the GLLUG mailing list