gdal-vector-pipeline - Man Page

Name

gdal-vector-pipeline — Process a vector dataset applying several steps

Added in version 3.11.

Description

gdal vector pipeline can be used to process a vector dataset and perform various processing steps that accept vector and generate vector.

For pipelines mixing raster and vector, consult gdal pipeline.

Most steps proceed in on-demand evaluation of features, unless otherwise stated in their documentation, without "materializing" the resulting dataset of the operation of each step. It may be desirable sometimes for performance purposes to proceed to materializing an intermediate dataset to disk using gdal vector materialize.

Synopsis

Usage: gdal vector pipeline [OPTIONS] <PIPELINE>

Process a vector dataset applying several steps.

Positional arguments:

Common Options:
  -h, --help              Display help message and exit
  --json-usage            Display usage as JSON document and exit
  --config <KEY>=<VALUE>  Configuration option [may be repeated]
  -q, --quiet             Quiet mode (no progress bar)

Options:
  --skip-errors           Skip errors when writing features

<PIPELINE> is of the form: read|concat [READ-OPTIONS] ( ! <STEP-NAME> [STEP-OPTIONS] )* ! write|info [WRITE-OPTIONS]

A pipeline chains several steps, separated with the ! (exclamation mark) character. The first step must be read or concat, and the last one info, partition or write. Each step has its own positional or non-positional arguments. Apart from read, concat, info, partition and write, all other steps can potentially be used several times in a pipeline.

Potential steps are:

Details for options can be found in gdal vector buffer.

Details for options can be found in gdal vector concat.

Details for options can be found in gdal vector clip.

Details for options can be found in gdal vector edit.

Details for options can be found in gdal vector explode-collections.

Details for options can be found in gdal vector filter.

Details for options can be found in gdal vector make-valid.

Details for options can be found in gdal vector materialize.

Details for options can be found in gdal vector reproject.

Details for options can be found in gdal vector segmentize.

Details for options can be found in gdal vector select.

Details for options can be found in gdal vector set-field-type.

Details for options can be found in gdal vector set-geom-type.

Details for options can be found in gdal vector simplify.

Details for options can be found in gdal vector simplify-coverage.

Details for options can be found in gdal vector sql.

Details for options can be found in gdal vector swap-xy.

Added in version 3.12.

* info [OPTIONS]
----------------

Return information on a vector dataset.

Options:
  -f, --of, --format, --output-format <OUTPUT-FORMAT>  Output format. OUTPUT-FORMAT=json|text
  -l, --layer, --input-layer <INPUT-LAYER>             Input layer name [may be repeated]
                                                       Mutually exclusive with --sql
  --features                                           List all features (beware of RAM consumption on large layers)
                                                       Mutually exclusive with --summary
  --summary                                            List the layer names and the geometry type
                                                       Mutually exclusive with --features
  --limit <FEATURE-COUNT>                              Limit the number of features per layer (implies --features)
  --sql <statement>|@<filename>                        Execute the indicated SQL statement and return the result
                                                       Mutually exclusive with --input-layer
  --where <WHERE>|@<filename>                          Attribute query in a restricted form of the queries used in the SQL WHERE statement
  --dialect <DIALECT>                                  SQL dialect

Details for options can be found in gdal vector info.

Added in version 3.12.

* partition [OPTIONS] <OUTPUT>
------------------------------

Partition a vector dataset into multiple files.

Positional arguments:
  -o, --output <OUTPUT>                                Output directory [required]

Options:
  --overwrite                                          Whether overwriting existing output is allowed
                                                       Mutually exclusive with --append
  --append                                             Whether appending to existing layer is allowed
                                                       Mutually exclusive with --overwrite
  -f, --of, --format, --output-format <OUTPUT-FORMAT>  Output format
  --co, --creation-option <KEY>=<VALUE>                Creation option [may be repeated]
  --lco, --layer-creation-option <KEY>=<VALUE>         Layer creation option [may be repeated]
  --field <FIELD>                                      Field(s) on which to partition [may be repeated] [required]
  --scheme <SCHEME>                                    Partitioning scheme. SCHEME=hive|flat (default: hive)
  --pattern <PATTERN>                                  Filename pattern ('part_%010d' for scheme=hive, '{LAYER_NAME}_{FIELD_VALUE}_%010d' for scheme=flat)
  --feature-limit <FEATURE-LIMIT>                      Maximum number of features per file
  --max-file-size <MAX-FILE-SIZE>                      Maximum file size (MB or GB suffix can be used)
  --omit-partitioned-field                             Whether to omit partitioned fields from target layer definition
  --skip-errors                                        Skip errors when writing features

Details for options can be found in gdal vector partition.

Gdalg Output (on-the-Fly / Streamed Dataset)

A pipeline can be serialized as a JSON file using the GDALG output format. The resulting file can then be opened as a vector dataset using the GDALG: GDAL Streamed Algorithm driver, and apply the specified pipeline in a on-the-fly / streamed way.

The command_line member of the JSON file should nominally be the whole command line without the final write step, and is what is generated by gdal vector pipeline ! .... ! write out.gdalg.json.

{
    "type": "gdal_streamed_alg",
    "command_line": "gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632"
}

The final write step can be added but if so it must explicitly specify the stream output format and a non-significant output dataset name.

{
    "type": "gdal_streamed_alg",
    "command_line": "gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632 ! write --output-format=streamed streamed_dataset"
}

Substitutions

Added in version 3.12.

It is possible to use gdal pipeline to use a pipeline already serialized in a .gdal.json file, and customize its existing steps, typically changing an input filename, specifying an output filename, or adding/modifying arguments of steps.

See Substitutions.

Nested Pipeline

Added in version 3.12.

It is possible to create "nested pipelines", i.e. pipelines inside pipelines.

A nested pipeline is delimited by square brackets ([ and ]) surrounded by a space character.

There are 2 kinds of nested pipelines:

See Nested pipeline.

Examples

Example 1: Reproject a GeoPackage file to CRS EPSG:32632 (“WGS 84 / UTM zone 32N”)

$ gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632 ! write out.gpkg --overwrite

Example 2: Serialize the command of a reprojection of a GeoPackage file in a GDALG file, and later read it

$ gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632 ! write in_epsg_32632.gdalg.json --overwrite
$ gdal vector info in_epsg_32632.gdalg.json

Example 3: None

Union 2 source shapefiles (with similar structure), reproject them to EPSG:32632, keep only cities larger than 1 million inhabitants and write to a GeoPackage

$ gdal vector pipeline ! concat --single --dst-crs=EPSG:32632 france.shp belgium.shp ! filter --where "pop > 1e6" ! write out.gpkg --overwrite

Author

Even Rouault <even.rouault@spatialys.com>

Info

Nov 07, 2025 GDAL