conda-index - Man Page
Name
conda-index — conda-index
conda index, formerly part of conda-build. Create repodata.json for collections of conda packages.
The conda_index command operates on a channel directory. A channel directory contains a noarch subdirectory at a minimum and will almost always contain other subdirectories named for conda's supported platforms linux-64, win-64, osx-64, etc. A channel directory cannot have the same name as a supported platform. Place packages into the same platform subdirectory each archive was built for. Conda-index extracts metadata from these packages to generate index.html, repodata.json etc. with summaries of the packages' metadata. Then conda uses the metadata to solve dependencies before doing an install.
By default, the metadata is output to the same directory tree as the channel directory, but it can be output to a separate tree with the --output <output> parameter. The metadata cache is always placed with the packages, in .cache folders under each platform subdirectory.
After conda-index has finished, its output can be used as a channel conda install -c file:///path/to/output ... or it would typically be placed on a web server.
Run Normally
python -m conda_index <path to channel directory>
Note conda index (instead of python -m conda_index) may find legacy conda-build index.
Run for Debugging
python -m conda_index --verbose --threads=1 <path to channel directory>
Contributing
conda create -n conda-index "python >=3.9" conda conda-build "pip >=22" git clone https://github.com/conda/conda-index.git pip install -e conda-index[test] cd conda-index pytest
Summary of Changes from the Previous Conda-Build Index Version
- Approximately 2.2x faster conda package extraction, by extracting just the metadata to streams instead of extracting packages to a temporary directory; closes the package early if all metadata has been found.
- No longer read existing repodata.json. Always load from cache.
- Uses a sqlite metadata cache that is orders of magnitude faster than the old many-tiny-files cache.
- The first time conda index runs, it will convert the existing file-based .cache to a sqlite3 database .cache/cache.db. This takes about ten minutes per subdir for conda-forge. (If this is interrupted, delete cache.db to start over, or packages will be re-extracted into the cache.) sqlite3 must be compiled with the JSON1 extension. JSON1 is built into SQLite by default as of SQLite version 3.38.0 (2022-02-22).
- Each subdir osx-64, linux-64 etc. has its own cache.db; conda-forge’s 1.2T osx-64 subdir has a single 2.4GB cache.db. Storing the cache in fewer files saves time since there is a per-file wait to open each of the many tiny .json files in old-style .cache/.
- cache.db is highly compressible, like the text metadata. 2.4G → zstd → 88M
- No longer cache paths.json (only used to create post_install.json and not referenced later in the indexing process). Saves 90% disk space in .cache.
- Updated Python and dependency requirements.
- Mercilessly cull less-used features.
- Format with black
Parallelism
This version of conda-index continues indexing packages from other subdirs while the main thread is writing a repodata.json.
All current_repodata.json are generated in parallel. This may use a lot of ram if repodata.json has tens of thousands of entries.
Command-line interface
python -m conda_index
python -m conda_index [OPTIONS] DIR
Options
- --output <output>
Output repodata to given directory.
- --subdir <subdir>
Subdir to index. Accepts multiple.
- -n, --channel-name <channel_name>
Customize the channel name listed in each channel's index.html.
- --patch-generator <patch_generator>
Path to Python file that outputs metadata patch instructions from its _patch_repodata function or a .tar.bz2/.conda file which contains a patch_instructions.json file for each subdir
- --channeldata, --no-channeldata
Generate channeldata.json. Conflicts with --no-write-monolithic.
- Default
False
- --rss, --no-rss
Write rss.xml (Only if --channeldata is enabled).
- Default
True
- --bz2, --no-bz2
Write repodata.json.bz2.
- Default
False
- --zst, --no-zst
Write repodata.json.zst.
- Default
False
- --run-exports, --no-run-exports
Write run_exports.json. Conflicts with --no-write-monolithic.
- Default
False
- --compact, --no-compact
Output JSON as one line, or pretty-printed.
- Default
True
- -m, --current-index-versions-file <current_index_versions_file>
YAML file containing name of package as key, and list of versions as values. The current_index.json will contain the newest from this series of versions. For example:
- python:
- 3.8
- 3.9
will keep python 3.8.X and 3.9.Y in the current_index.json, instead of only the very latest python version.
- --base-url <base_url>
If packages should be served separately from repodata.json, URL of the directory tree holding packages. Generates repodata.json with repodata_version=2 which is supported in conda 24.5.0 or later.
- --update-cache, --no-update-cache
Control whether listdir() is called to refresh the set of available packages. Used to generate complete repodata.json from cache only when packages are not on disk. (Experimental)
- Default
True
- --upstream-stage <upstream_stage>
Set to 'clone' to generate example repodata from conda-forge test database. (Experimental)
- --current-repodata, --no-current-repodata
Skip generating current_repodata.json, a file containing only the newest versions of all packages and their dependencies, only used by the classic solver. Conflicts with --no-write-monolithic.
- Default
True
- --threads <threads>
- Default
12
- --verbose
Enable debug logging.
- --write-monolithic, --no-write-monolithic
Write repodata.json with all package metadata in a single file.
- --write-shards, --no-write-shards
Write a repodata.msgpack.zst index and many smaller files per CEP-16. (Experimental)
- --db <db>
Choose database backend. "sqlite3" (default) or "postgresql" (Experimental)
- Options
sqlite3 | postgresql
- --db-url <db_url>
SQLAlchemy database URL when using --db=postgresql. Alternatively, use the CONDA_INDEX_DBURL environment variable. (Experimental)
- Default
postgresql:///conda_index
- --html-dependencies, --no-html-dependencies
Include dependency popups in generated HTML index files. May significantly increase file size for large repositories like main or conda-forge.
- Default
False
Arguments
- DIR
Required argument
Environment variables
- CONDA_INDEX_DBURL
Provide a default for --db-url
conda_index
conda_index.index
This module provides the main entry point to create indexes from collections of conda packages.
- conda_index.index.update_index(dir_path, output_dir=None, check_md5=False, channel_name=None, patch_generator=None, threads: int | None = 12, verbose=False, progress=False, subdirs=None, warn=True, current_index_versions=None, debug=False, write_bz2=True, write_zst=False, write_run_exports=False, html_dependencies=False)
High-level interface to ChannelIndex. Index all subdirs under dir_path. Output to output_dir, or under the input directory if output_dir is not given. Writes updated channeldata.json.
The input dir_path should at least contain a directory named noarch. The path tree therein is treated as a full channel, with a level of subdirs, each subdir having an update to repodata.json. The full channel will also have a channeldata.json file.
- class conda_index.index.ChannelIndex(channel_root: ~pathlib.Path | str, channel_name: str | None, subdirs: ~typing.Iterable[str] | None = None, threads: int | None = 12, deep_integrity_check=False, debug=False, output_root=None, cache_class: type[~conda_index.index.cache.BaseCondaIndexCache] = <class 'conda_index.index.sqlitecache.CondaIndexCache'>, write_bz2=False, write_zst=False, write_run_exports=False, compact_json=True, write_monolithic=True, write_shards=False, html_dependencies=False, *, channel_url: str | None = None, fs: ~conda_index.index.fs.MinimalFS | None = None, base_url: str | None = None, save_fs_state=True, write_current_repodata=True, upstream_stage: str = 'fs', cache_kwargs=None)
Class implementing update_index. Allows for more fine-grained control of output.
See the implementation of conda_index.cli for usage.
- Parameters
- channel_root -- Path to channel, or just the channel cache if channel_url is provided.
- channel_name -- Name of channel; defaults to last path segment of channel_root.
- subdirs -- subdirs to index.
- output_root -- Path to write repodata.json etc; defaults to channel_root.
- channel_url -- fsspec URL where package files live. If provided, channel_root will only be used for cache and index output.
- fs -- MinimalFS instance to be used with channel_url. Wrap fsspec AbstractFileSystem with conda_index.index.fs.FsspecFS(fs).
- base_url -- Add base_url/<subdir> to repodata.json to be able to host packages separate from repodata.json
- save_fs_state -- Pass False to use cached filesystem state instead of os.listdir(subdir)
- write_monolithic -- Pass True to write large 'repodata.json' with all packages.
- write_shards -- Pass True to write sharded repodata.msgpack and per-package fragments.
- html_dependencies -- Pass True to include dependency popups in generated HTML index files.
- index(patch_generator, verbose=False, progress=False, current_index_versions=None)
Examine all changed packages under self.channel_root, updating index.html for each subdir.
- update_channeldata(rss=False)
Update channeldata based on re-reading output repodata.json and existing channeldata.json. Call after index() if channeldata is needed.
conda_index.index.fs
This module provides a filesystem abstraction for sourcing packages.
Minimal (just what conda-index uses) filesystem abstraction.
Allows fsspec to be used to index remote repositories, without making it a required dependency.
- class conda_index.index.fs.FileInfo(fn: str, st_mtime: Number, st_size: Number)
Filename and a bit of stat information.
fn: str
st_mtime: Number
st_size: Number
- class conda_index.index.fs.FsspecFS(fsspec_fs)
Wrap a fsspec filesystem to pass to ChannelIndex
basename(path: str) -> str
fsspec_fs: AbstractFileSystem
join(*paths)
listdir(path: str) -> list[dict]
open(path: str, mode: str = 'rb')
stat(path: str)
- class conda_index.index.fs.MinimalFS
Filesystem API as needed by conda-index, for fsspec compatibility.
basename(path) -> str
join(*paths)
listdir(path) -> Iterable[dict]
open(path: str, mode: str = 'rb')
stat(path: str)
Database schema
Standalone conda-index uses a per-subdir sqlite database to track package metadata, unlike the older version which used millions of tiny .json files. The new strategy is much faster because we don't have to pay for many individual stat() or open() calls.
The whole schema looks like this:
<subdir>/.cache % sqlite3 cache.db
SQLite version 3.41.2 2023-03-22 11:56:21
Enter ".help" for usage hints.
sqlite> .schema
CREATE TABLE about (path TEXT PRIMARY KEY, about BLOB);
CREATE TABLE index_json (path TEXT PRIMARY KEY, index_json BLOB);
CREATE TABLE recipe (path TEXT PRIMARY KEY, recipe BLOB);
CREATE TABLE recipe_log (path TEXT PRIMARY KEY, recipe_log BLOB);
CREATE TABLE run_exports (path TEXT PRIMARY KEY, run_exports BLOB);
CREATE TABLE post_install (path TEXT PRIMARY KEY, post_install BLOB);
CREATE TABLE icon (path TEXT PRIMARY KEY, icon_png BLOB);
CREATE TABLE stat (
stage TEXT NOT NULL DEFAULT 'indexed',
path TEXT NOT NULL,
mtime NUMBER,
size INTEGER,
sha256 TEXT,
md5 TEXT,
last_modified TEXT,
etag TEXT
);
CREATE UNIQUE INDEX idx_stat ON stat (path, stage);
CREATE INDEX idx_stat_stage ON stat (stage, path);sqlite> select stage, path from stat where path like 'libcurl%'; fs|libcurl-7.84.0-hc6d1d07_0.conda fs|libcurl-7.86.0-h0f1d93c_0.conda fs|libcurl-7.87.0-h0f1d93c_0.conda fs|libcurl-7.88.1-h0f1d93c_0.conda fs|libcurl-7.88.1-h9049daf_0.conda indexed|libcurl-7.84.0-hc6d1d07_0.conda indexed|libcurl-7.86.0-h0f1d93c_0.conda indexed|libcurl-7.87.0-h0f1d93c_0.conda indexed|libcurl-7.88.1-h0f1d93c_0.conda indexed|libcurl-7.88.1-h9049daf_0.conda
Most of these tables store json-format metadata extracted from each package.
select * from index_json where path = 'libcurl-7.88.1-h9049daf_0.conda';
'libcurl-7.88.1-h9049daf_0.conda'
'{"build":"h9049daf_0",...,"sha256":"37b8d58c05386ac55d1d8e196c90b92b0a63f3f1fe2fa916bf5ed3e1656d8e14","size":321706}'To track whether a package is indexed in the cache or not, conda-index uses a table named stat. The main point of this table is to assign a stage value to each artifact filename; usually 'fs' which is called the upstream stage, and 'indexed'. 'fs' means that the artifact is now available in the set of packages (assumed by default to be the local filesystem). 'indexed' means that the entry already exists in the database (same filename, same timestamp, same hash), and its package metadata has been extracted to the index_json etc. tables. Paths in 'fs' but not in 'indexed' need to be unpacked to have their metadata added to the database. Paths in 'indexed' but not in 'fs' will be ignored and left out of repodata.json.
First, conda-index adds all files in a subdir to the upstream stage. This involves a listdir() and stat() for each file in the index. The default upstream stage is named fs, but this step is designed to be overridden by subclassing CondaIndexCache() and replacing the save_fs_state() and changed_packages() methods. By overriding CondexIndexCache() it is possible to index without calling stat() on each package, or without even having all packages stored on the indexing machine.
Next, conda-index looks for all changed_packages(): paths in the upstream (fs) stage that don't exist in or have a different modification time than those in thie indexed stage.
Finally, a join between the upstream stage, usually 'fs', and the index_json table yields a basic repodata_from_packages.json without any repodata patches.
SELECT path, index_json FROM stat JOIN index_json USING (path) WHERE stat.stage = :upstream_stage
The steps to create repodata.json, including any repodata patches, and to create current_repodata.json with only the latest versions of each package, are similar to pre-sqlite3 conda-index.
The other cached metadata tables are used to create channeldata.json.
Sample queries
Megabytes added per day:
select
date(mtime, 'unixepoch') as d,
printf('%0.2f', sum(size) / 1e6) as MB
from
stat
group by
date(mtime, 'unixepoch')
order by
mtime descPostgreSQL Support in conda-index
As of conda-index 0.7.0, conda-index can use a PostgreSQL database. conda-index uses a database to store package metadata, creating repodata from a query. By default, it will use a sqlite3 database stored alongside the package files, but it can optionally use PostgreSQL.
The database backend is controlled by the --db <backend> and --db-url command line arguments, or the CONDA_INDEX_DBURL environment variable replaces --db-url. For example, python -m conda_index --db postgresql chooses PostgreSQL with the default postgresql:///conda_index database URL.
To use a PostgreSQL database with conda-index, install conda-index's PostgreSQL-specific dependencies into its environment:
conda install sqlalchemy psycopg2
Then, install a local PostgreSQL with conda:
# Create a local PostgreSQL installation and conda_index database conda install postgresql initdb -D conda-index-db pg_ctl -D conda-index-db -l logfile start createdb conda_index
Finally, run the following command:
python -m conda_index --db postgresql --db-url postgresql:///conda_index [DIR]
conda_index stores package metadata in the PostgreSQL database given by a SQLAlchemy database URL.
The schema is similar to the one used for sqlite3, except that while sqlite3 uses a database file per subdirectory, in PostgreSQL all subdirectories are stored in the same database. conda_index creates a random prefix in [DIR]/.cache/cache.json to differentiate this channel from any others that may be stored in the same PostgreSQL database. Each package name is stored with the format <prefix>/<subdir>/<package>.conda in a single database.
Advanced users can use the CLI or the API to run conda_index on a partial local package repository. It is possible to add a few local packages to a much larger index instead of keeping every package on the machine running conda-index. For example, by inserting packages into the stat table and then running python -m conda_index --db postgresql --no-update-cache [DIR], conda-index can add or update packages in [DIR] to repodata without necessarily storing either the entire set of packages or the conda-index database on that machine.
Changelog
0.7.0 (2025-10-13)
Enhancements
- Add postgresql as a supported database backend in addition to sqlite. (#199)
- Show error when --no-write-monolithic is combined with --current-repodata, --run-exports, or --channeldata. (#224)
- Add html title popup with dependencies for each build to index.html. (#205)
- "--html-dependencies/--no-html-dependencies" flag toggles popups. (#218)
Docs
- Include narrative documentation for python -m conda_index --db postgresql ... in Sphinx (https://conda.github.io/conda-index/). (#219)
Other
- Update conda index command plugin to avoid re-exported type. (#227)
Contributors
- @dholth
- @jtroe
- @ryanskeith
0.6.1 (2025-05-22)
Enhancements
- Added support for Python 3.13 in the CI test matrix and updated related configurations. (#203)
Bug fixes
- In sharded repodata, set base_url and shards_base_url to "" instead of leaving them undefined, for pixi compatibility. (#209)
Other
- Add database-independent base class for (sqlite specific) CondaIndexCache. Return parsed data instead of str in run_exports(). (#206)
- Update sqlite3 create_function() arguments for "positional-only in Python 3.15" warning. (#211)
0.6.0 (2025-03-27)
Enhancements
- Add --channeldata/--no-channeldata flag to toggle generating channeldata.
- Add sharded repodata (repodata split into separate files per package name).
Other
- Remove WAL mode from database create script, in case conda-index is used on a network file system. Note WAL mode is persistent, PRAGMA journal_mode=DELETE can be used to convert a WAL database back to a rollback journal mode. (#177)
- Separate current_repodata generation into own file, raising possibility of "doesn't depend on conda" mode.
- Update tests to account for conda-build removals. (#180)
- Publish new conda-index releases on PyPI automatically. (#195)
See also https://github.com/conda/conda-index/releases/tag/0.6.0
0.5.0 (2024-06-07)
Enhancements
- Add experimental python -m conda_index.json2jlap script to run after indexing, to create repodata.jlap patch sets for incremental repodata downloads. (#125)
- Add --current-repodata/--no-current-repodata flags to control whether current_repodata.json is generated. (#139)
- Add support for CEP-15 base_url to host packages separate from repodata. (#150)
- Support fsspec (in the API only) to index any fsspec-supported remote filesystem. Also enables the input packages folder to be separate from the cache and output folders. (#143)
Bug fixes
- Move run_exports.json query into cache, instead of directly using SQL in ChannelIndex. (#163)
- Create parents when creating <subdir>/.cache (#166)
Other
- Approach 100% code coverage in test suite; reformat with ruff. (#145)
- Update CI configuration to test on more platforms (#142)
- Drop support for Python 3.7; support Python 3.8+ only. (#130)
Contributors
- @dholth
- @jezdez
- @conda-bot
0.4.0 (2024-01-29)
Enhancements
- Add --compact-json/--no-compact-json option, default to compact. (#120)
- Add an index subcommand using conda's new subcommand plugin hook, allowing conda index instead of python -m conda_index. Note the CLI has changed compared to old conda-index. When conda-build < 24.1.0 is installed, the older conda-index code will still be used instead of this plugin. (#81 via #131)
Bug fixes
- Check size in addition to mtime when deciding which packages to index. (#108)
- Update cached index.json, not just stat values, for changed packages that are already indexed. (#108)
Other
- Improve test coverage (#123)
- Apply ruff --fix; reformat code; syntax cleanup (#128)
0.3.0 (2023-09-21)
Enhancements
- Add --run-exports to generate CEP-12 compliant run_exports.json documents for each subdir. (#102 via #110)
- Don't pretty-print repodata.json by default, saving time and space. (#111)
Docs
- Improve documentation.
Deprecations
- Require conda >= 4.14 (or any of the >= 22.x.y calver releases)
- Index
- Module Index
- Search Page
Author
conda
Copyright
conda