trietool - Man Page

trie manipulation tool

Synopsis

trietool [ options ] trie command arg ...

Description

trietool is the command-line tool for manipulating double-array trie  data.  It can be used to query, add and remove words in a trie.

The Trie

The trie argument specifies the name of the trie to manipulate.  A trie is stored in a file with `.tri' extension. However, to create a new trie, one needs to prepare a file with `.abm' extension, describing the Unicode ranges of alphabet set of the trie.  The ABM defines a set of vectors that map Unicode characters into a continuous range of integers. The mapped integers will be used as internal alphabet for the trie.  Such mapping can improve the space allocation within the trie data, regardless  of non-continuity of the character set being used, as the mapped range is  always continuous.

The ABM file is a plain text file, with each line listing a range of 32-bit  Unicodes to be added to the alphabet set, in the format:

[0xSSSS,0xTTTT]

where `0xSSSS' and `0xTTTT' are hexadecimal values of starting and ending  character code for the range, respectively.

For example, for a dictionary that contains only English words witout any  punctuations, one may prepare `trie.abm' as:

[0x0041,0x005a]
[0x0061,0x007a]

The first line lists the ASCII codes for A-Z, and the second for a-z.

No more than 255 alphabets are allowed in a trie.

The created `.tri' file will incorporate the ABM data.  So, the `.abm' file is not required after the first creation, and will be ignored.

Commands

Available commands are:

add word data ...

Add word to trie, associated with integer data.  Arbitrary number of words-data pairs can be given.  Two arguments will be read at a time, the first  will be treated as word, and the second as data.

add-list [ options ] list-file

Add words with associated data listed in list-file to trie.  The  list-file must be a text file listing one word per line.  The associated data can be put after the word in the same line, separated with tab (`\t') character.  If the data field is omitted, a default value (-1) will be used instead.

Options are available for this command:

-e, --encoding enc

Specify character encoding of the list-file contents, such as `UTF-8'. If omitted, current locale codeset is assumed.

delete word ...

Delete word from trie.  Arbitrary number of words to delete can be given.

delete-list [ options ] list-file

Delete words listed in list-file from trie.  The list-file must be  a text file listing one word per line.

Options are available for this command:

-e, --encoding enc

Specify character encoding of the list-file contents, such as `UTF-8'. If omitted, current locale codeset is assumed.

query word

Search for word in trie.  If word exists, its associated data is printed to standard output.  Otherwise, error message is printed to standard error, with nothing printed to standard output.

list

List all words in trie to standard output.  The output lists one word-data pair per line, separated with tab (`\t') character, the format appropriate for being list-file for the add-list command.

Options

This program follows the usual GNU command line syntax, with long options starting with two dashes (`--'). A summary of options is included below.

-p,  --path dir

Set trie directory to dir [default=`.']

-h,  --help

Show summary of options.

-V,  --version

Show version of program.

Author

libdatrie was written by Theppitak Karoonboonyanan.

This manual page was written by Theppitak Karoonboonyanan <theppitak@gmail.com>.

Referenced By

The man page trietool-0.2(1) is an alias of trietool(1).

DECEMBER 2008