Package tesseract

Raw OCR Engine

A commercial quality OCR engine originally developed at HP between 1985 and
1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was
open-sourced by HP and UNLV in 2005.

General Commands
Command Description
ambiguous_words generate sets of words Tesseract is likely to find ambiguous
cntraining character normalization training for Tesseract
combine_tessdata combine/extract/overwrite Tesseract data
dawg2wordlist convert a Tesseract DAWG to a wordlist
mftraining feature training for Tesseract
shapeclustering shape clustering training for Tesseract
tesseract command-line OCR engine
unicharset_extractor extract unicharset from Tesseract boxfiles
wordlist2dawg convert a wordlist to a DAWG for Tesseract
File Formats
File Description
unicharambigs Tesseract unicharset ambiguities
unicharset character properties file used by tesseract(1)