[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
xml-grep (was: Re: XML for config files)
- To: tech_(_at_)_openbsd_(_dot_)_org
- Subject: xml-grep (was: Re: XML for config files)
- From: Ian Darwin <ian_(_at_)_darwinsys_(_dot_)_com>
- Date: Wed, 21 Nov 2001 15:34:12 -0500
> > Another issue with XML is: how does one use grep, sed, awk, or perl to
> > extract information from it? Using these simple regexp tools in shell
> > scripts is a very powerful way to collect and use configuration
> > information. With XML, this common task may be very difficult (i.e.,
> > what does a regular expression for picking data out of XML look like?)
>
> I am not convinced that XML config files are the way to go, but as for
> parsing XML with perl that should be no problem. Months ago I played with
> a couple perl modules for parsing XML and had no problem with them.
>
> You constructed a parsing object, registered callbacks for the various XML
> tags that could be encountered, and ran the parser on the file.
That is what XML calls a SAX parser (simple API for XML) style of parsing.
The other style, which there is probably also a Perl interface to, is the DOM
style, which makes an in-memory tree of nodes representing the XML elements.
> As for the other scripting languages, that certainly does sound like it
> could be an interesting problem...
For any complete programming language (Perl, Python, Java, C++), it's basically a
solved problem - SAX and DOM parsers exist, period. For Java, there is also JDOM
which is "more Java-like" than pure DOM; presumably other languages have
their own "more like us" interfaces.
For grep, though, it's not readily solvable. Grep is designed for flat files, not
heirarchically-organized, tag-based languages like HTML/SGML/XML. I
guess writing an "xmlgrep" is a good project for somebody. It has to be as small
and as fast as real grep, but has to handle any well-formed XML (remember that
XML, like most modern languages, is free-form, so that
<tag attribute='value'>contents</tag>
and
<tag
attribute='value'
>
contents
</tag
>
are equivalent in XML, but not to our existing tools like grep. Such a new grep
would need an extended language for specifying whether to match the tag names,
their contents, their attributes, content only in certain tags, etc... Much of this is
already spelled out (in XML-ese) in XPath, the part of XSL that specifies what
elements to include in a transformed output. But I doubt that its syntax would
satisfy UNIX people!
Visit your host, monkey.org