public-inbox-extindex - Man Page

create and update external search indices

Synopsis

public-inbox-extindex [Options] EXTINDEX_DIR INBOX_DIR...

public-inbox-extindex [Options] [EXTINDEX_DIR] --all

Description

public-inbox-extindex creates and updates an external search and overview database used by the read-only public-inbox PSGI (HTTP), NNTP, and IMAP interfaces.  This requires either the Search::Xapian XS bindings OR the Xapian SWIG bindings, along with DBD::SQLite and DBI Perl modules.

Options

-j JOBS
--jobs=JOBS

... TODO, see public-inbox-index(5)

--all

Index all publicinbox entries in PI_CONFIG.

publicinbox entries indexed by public-inbox-extindex can have full Xapian searching abilities with the per-publicinbox indexlevel set to basic and their respective Xapian (xap15 or xapian15) directories removed.  For multiple public-inboxes where cross-posting is common, this allows significant space savings on Xapian indices.

--gc

Perform garbage collection instead of indexing.  Use this if inboxes are removed from the extindex, or if messages are purged or removed from some inboxes.

--reindex

Forces a re-index of all messages in the extindex.  This can be used for in-place upgrades and bugfixes while read-only server processes are utilizing the index.  Keep in mind this roughly doubles the size of the already-large Xapian database.

The extindex locks will be released roughly every 10s to allow public-inbox-mda(1) and public-inbox-watch(1) processes to write to the extindex.

--fast

Used with --reindex, it will only look for new and stale entries and not touch already-indexed messages.

Files

public-inbox-extindex-format(5)

Configuration

public-inbox-extindex does not currently write to the public-inbox-config(5) file, configuration may be entered manually.  The extindex name of all is a special case which corresponds to indexing --all inboxes.  An example for --all is as follows:

        [extindex "all"]
                topdir = /path/to/extindex_dir
                url = all
                coderepo = foo
                coderepo = bar

See public-inbox-config(5) for more details.

Environment

PI_CONFIG

Used to override the default “~/.public-inbox/config” value.

XAPIAN_FLUSH_THRESHOLD

The number of documents to update before committing changes to disk.  This environment is handled directly by Xapian, refer to Xapian API documentation for more details.

Setting XAPIAN_FLUSH_THRESHOLD or publicinbox.indexBatchSize for a large --reindex may cause public-inbox-mda(1), public-inbox-learn(1) and public-inbox-watch(1) tasks to wait long and unpredictable periods of time during --reindex.

Default: none, uses publicinbox.indexBatchSize

Upgrading

Occasionally, public-inbox will update it's schema version and require a full index by running this command.

Contact

Feedback welcome via plain-text mail to <mailto:meta@public-inbox.org>

The mail archives are hosted at <https://public-inbox.org/meta/> and <http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>

See Also

Search::Xapian, DBD::SQLite

Referenced By

lei(1), lei-add-external(1), lei-overview(7), public-inbox-config(5), public-inbox-tuning(7).

1993-10-02 public-inbox.git public-inbox user manual