[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: OpenBSD 3.4 isspace() b0rked (was: Problem compacting databases (again!))



On Mon, 24 Jan 2005, Matthias Andree wrote:

> > ===========================================================================
> > space
> >
> > Define characters to be classified as white-space characters.
> >
> > In the POSIX locale, at a minimum, the <space>, <form-feed>, <newline>,
> > <carriage-return>, <tab>, and <vertical-tab> shall be included.
> > ===========================================================================
> >
> > So extension is allowed in the Posix locale. Seems the man page is not
> > right, and the 'only' word has to be scrapped.
> 
> Read the whole document please, further down you'll find:

Bah, the art of writing unambiguous specs is really rare.

> | LC_CTYPE Category in the POSIX Locale
> | 
> | The character classifications for the POSIX locale follow; the code listing
> | depicts the localedef input, and the table represents the same information,
> | sorted by character.
> | 
> | LC_CTYPE
> | # The following is the POSIX locale LC_CTYPE.
> | # "alpha" is by default "upper" and "lower"
> | # "alnum" is by definition "alpha" and "digit"
> | # "print" is by default "alnum", "punct", and the <space>
> | # "graph" is by default "alnum" and "punct"
> | #
> | ...
> | space    <tab>;<newline>;<vertical-tab>;<form-feed>;\
> |          <carriage-return>;<space>
> | ...
> 
> No mention this is extensible.
> 
> The table is exhaustive, particularly no mention of 0xA0 or ISO-8859.
> 
> Note particularly that POSIX doesn't even depend on ASCII, see
> <http://www.opengroup.org/onlinepubs/000095399/xrat/xbd_chap06.html>
> 
> OpenBSD had better constrain itself to ASCII unless an ISO-8859-* locale
> is explicitly specified, for portability and security reasons.

Reviewing this again, I think you might be right. C99 is also clear that 
the set of isspace() chars is fixed in the C/Posix locale.

But there might be reasons we do this different, sadly the commit that 
introduced the IS8859 interpretation (rev 1.4) did not include a message 
why this was done and how this relates to Posix.

I'll take this up with the other developers.

	-Otto