ngramcount man page

ngramcount — count n-grams from an input FAR file

Synopsis

ngramcount [Options] [in.far [out.fst]]

Description

Count n-grams from an input FST archive (FAR) file.

Options

Boolean options accept either true or false as a value, or the default value if neither value is specified.

--alpha=FLOAT
Weight for first FST. Default: 1.0.
--backoff_label=INT64
Backoff label. Default: 1.
--beta=FLOAT
Weight for second and subsequent FSTs. Default: 1.0.
--check_consistency[=BOOLEAN]
Check model consistency. Default: false.
--context_pattern=STRING
Pattern of contexts to count. Default: "". --end_symbol=STRING Class label for sentence end. Default: "</s>".
--epsilon_as_backoff[=BOOLEAN]
Treat epsilon in the input FSTs as backoff. Default: false.
--fst_align[=BOOLEAN]
Write FST data aligned where appropriate. Default: false.
--fst_compat_symbols[=BOOLEAN]
Require symbol tables to match when appropriate. Default: true.
--fst_default_cache_gc[=BOOLEAN]
Enable garbage collection of cache. Default: true.
--fst_default_cache_gc_limit=INT64
Set the cache byte size that triggers garbage collection. Default: 1048576.
--fst_error_fatal[=BOOLEAN]
If true, FST errors are fatal. Otherwise, returned objects are flagged as bad. For example, FSTs are returned with the kError property set to true, and FST weights set so that Member() returns false. Default: true.
--fst_field_separator=STRING
Set the characters used as a separator between printed fields. Default: " ".
--fst_read_mode=STRING
Default file reading mode for mappable files, "read" to use read(2) or "map" to use mmap(2). Default: "read".
--fst_verify_properties[=BOOLEAN]
Verify fst properties queried by TestProperties. Default: false.
--fst_weight_parentheses=STRING
Set the characters enclosing the first weight of a printed composite weight (e.g., pair weight, tuple weight, and derived classes) to ensure proper I/O of nested composite weights. Must have size 0 (none) or 2 (open and close parenthesis). Default: no parentheses.
--fst_weight_separator=CHARACTER
Set the character separator between printed composite weights. Default: comma.
--help[=BOOLEAN]
Show usage information. Default: false.
--helpshort[=BOOLEAN]
Show brief usage information. Default: false.
--method=STRING
Set the counting method. Must be one of "counts", "histograms", "count_of_counts", or "count_of_histograms". Default: "counts".
--ngram_error_fatal[=BOOLEAN]
If true, NGram errors are fatal. Otherwise, returned objects are flagged as bad; i.e., NGramModel::Error() returns true. Default: true.
--norm_eps=FLOAT
Normalization check epsilon. Default: 0.001.
--normalize[=BOOLEAN]
Normalize the resulting model. Default: false.
--order=INT64
Set maximal order of n-grams to be counted. Default: 3.
--output_fst[=BOOLEAN]
Output counts as FST, otherwise as strings. Default: true.
--require_symbols[=BOOLEAN]
Require symbol tables. Default: true.
--round_to_int[=BOOLEAN]
Round all counts to integers. Default: false.
--save_relabel_ipairs=FILENAME
Save input relabel pairs to file. Default: do not save.
--save_relabel_opairs=FILENAME
Save output relabel pairs to file. Default: do not save.
--start_symbol=STRING
Class label for sentence start. Default: "<s>".
--tmpdir=PATHNAME
Set the temporary directory to use. Default: /tmp.
--v[=INT32]
Set the verbosity level. Default: 0.

See Also

ngramapply(1), ngramcontext(1), ngramhisttest(1), ngraminfo(1), ngrammake(1), ngrammarginalize(1), ngrammerge(1), ngramperplexity(1), ngramprint(1), ngramrandgen(1), ngramrandtest(1), ngramread(1), ngramshrink(1), ngramsort(1), ngramsplit(1), ngramsymbols(1), ngramtransfer(1)

Referenced By

ngramapply(1), ngramcontext(1), ngramhisttest(1), ngraminfo(1), ngrammake(1), ngrammarginalize(1), ngrammerge(1), ngramperplexity(1), ngramprint(1), ngramrandgen(1), ngramrandtest(1), ngramread(1), ngramshrink(1), ngramsort(1), ngramsplit(1), ngramsymbols(1), ngramtransfer(1).

1.3.1 OpenGrm NGram User Commands