Sponsor:

Your company here, and a link to your site. Click to find out more.

hxunent - Man Page

replace HTML predefined character entities by UTF-8

Synopsis

hxunent [ -b ] [ -f ] [ file ]

Description

The hxunent command reads the file (or standard input) and copies it to standard output with &-entities by their equivalent character (encoded as UTF-8). E.g., &quot; is replaced by " and &lt; is replaced by <.

Options

The following options are supported:

-b

The five builtin entities of XML (&lt; &gt; &quot; &apos; &amp;) are not replaced but copied unchanged. This is necessary if the output has to be valid XML or SGML.

-f

This option changes how unknown entities or lone ampersands are handled. Normally they are copied unchanged, but this option tries to "fix" them by replacing ampersands by &amp;. Often such stray ampersands are the result of copy and paste of URLs into a document and then this option indeed fixes them and makes the document valid.

Diagnostics

The program's exit value is 0 if all went well, otherwise:

  1. The input couldn't be read (file not found, file not readable...)
  2. Wrong command line arguments.

See Also

asc2xml(1), xml2asc(1), UTF-8 (RFC 2279)

Bugs

The program assumes entities are as defined by HTML. It doesn't read a document's DTD to find the actual definitions in use in a document.  With -f, it will even remove all entities that are not HTML entities.

Referenced By

hxaddid(1), hxcite(1), hxcite-mkbib(1), hxcount(1), hxincl(1), hxindex(1), hxmkbib(1), hxnormalize(1), hxpipe(1).

10 Jul 2011 7.x HTML-XML-utils