xmlsimple.3am - Man Page

Provides facilities for writing simple one-line scripts with the gawk-xml extension. Also provides higher-level functions that simplify writing more complex scripts.

Synopsis

@include "xmlsimple"

parentpath = XmlParent(path)
test = XmlMatch(path)
scopepath = XmlMatchScope(path)
ancestorpath = XmlMatchAttr(path, name, value, mode)

XmlGrep()

Description

The xmlsimple library facilitates writing scripts based on the gawk-xml extension. It is an alternative to the xmllib library. A key difference is that $0 is not changed, so xmlsimple is compatible with awk code that relies on the gawk-xml core interface.

Short token variable names

To shorten simple scripts, xmlsimple provides two-letter named variables that duplicate predefined token-related core variables:

XD

Equivalent to XMLDECLARATION.

SD

Equivalent to XMLSTARTDOCT.

ED

Equivalent to XMLENDDOCT.

PI

Equivalent to XMLPROCINST.

SE

Equivalent to XMLSTARTELEM.

EE

Equivalent to XMLENDELEM.

TX

Equivalent to XMLCHARDATA.

SC

Equivalent to XMLSTARTCDATA.

EC

Equivalent to XMLENDCDATA.

CM

Equivalent to XMLCOMMENT.

UP

Equivalent to XMLUNPARSED.

EOI

Equivalent to XMLENDDOCUMENT.

Collecting character data

Character data items between element tags are automatically collected in a single CHARDATA variable. This feature simplifies processing text data interspersed with comments, processing instructions or CDATA markup.

CHARDATA

Available at every XMLSTARTELEMENT or XMLENDELEMENT token. Contains all the character data since the previous start- or end-element tag.

Whitespace handling

The XMLTRIM mode variable controls whether whitespace in the CHARDATA variable is automatically trimmed or not. Possible values are:

XMLTRIM = 0

Keep all whitespace

XMLTRIM = 1 (default)

Discard leading and trailing whitespace, and collapse contiguous whitespace characters into a single space char.

XMLTRIM = -1

Just collapse contiguous whitespace characters into a single space char. Keeps the collapsed leading or trailing whitespace.

Record ancestors information

The ATTR array variable automatically keeps the attributes of every ancestor of the current element, and of the element itself.

ATTR[path@attribute]

Contains the value of the specified attribute of the ancestor element at the given path.

Example

While processing a /books/book/title element, ATTR["/books/book@on-loan"] contains the name of the book loaner.

Grep-like facilities

XmlGrep()

If invoked at the XMLSTARTELEM event, causes the whole element subtree to be copied to the output.

Notes

The xmlsimple library includes both the xmlbase and xmlcopy libraries. Their functionality is implicitly available.

Bugs

The path related functions only operate on elements. Comments, processing instructions or CDATA sections are not taken into account.

XmlGrep() cannot be used to copy tokens outside the root element (XML prologue or epilogue).

See Also

XML Processing With gawk, xmlbase(3am), xmlcopy(3am), xmltree(3am), xmlwrite(3am).

Author

Manuel Collado, m-collado@users.sourceforge.net.

Copying Permissions

Copyright (C) 2017, Free Software Foundation, Inc.

Permission is granted to make and distribute verbatim copies of this manual page provided the copyright notice and this permission notice are preserved on all copies.

Permission is granted to copy and distribute modified versions of this manual page under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.

Permission is granted to copy and distribute translations of this manual page into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation.

Info

January 2017 GAWK Extension Library (gawkextlib) GNU Awk Extension Modules