gdal-vector-pipeline - Man Page

Name

gdal-vector-pipeline — Process a vector dataset applying several steps

Added in version 3.11.

Description

gdal vector pipeline can be used to process a vector dataset and perform various processing steps that accept vector and generate vector.

For pipelines mixing raster and vector, consult gdal pipeline.

Most steps proceed in on-demand evaluation of features, unless otherwise stated in their documentation, without "materializing" the resulting dataset of the operation of each step. It may be desirable sometimes for performance purposes to proceed to materializing an intermediate dataset to disk using gdal vector materialize.

Synopsis

Usage: gdal vector pipeline [OPTIONS] <PIPELINE>

Process a vector dataset applying several steps.

Positional arguments:

Common Options:
  -h, --help              Display help message and exit
  --json-usage            Display usage as JSON document and exit
  --config <KEY>=<VALUE>  Configuration option [may be repeated]
  -q, --quiet             Quiet mode (no progress bar)

Options:
  --skip-errors           Skip errors when writing features

<PIPELINE> is of the form: read|concat [READ-OPTIONS] ( ! <STEP-NAME> [STEP-OPTIONS] )* ! write|info [WRITE-OPTIONS]

A pipeline chains several steps, separated with the ! (exclamation mark) character. The first step must be read or concat, and the last one info, partition or write. Each step has its own positional or non-positional arguments. Apart from read, concat, info, partition and write, all other steps can potentially be used several times in a pipeline.

Potential steps are:

read

* read [OPTIONS] <INPUT>
------------------------

Read a vector dataset.

Positional arguments:
  -i, --input <INPUT>                       Input vector datasets [required]

Options:
  -l, --layer, --input-layer <INPUT-LAYER>  Input layer name(s) [may be repeated]

Advanced Options:
  --if, --input-format <INPUT-FORMAT>       Input formats [may be repeated]
  --oo, --open-option <KEY>=<VALUE>         Open options [may be repeated]

buffer

* buffer [OPTIONS] <DISTANCE>
-----------------------------

Compute a buffer around geometries of a vector dataset.

Positional arguments:
  --distance <DISTANCE>                    Distance to which to extend the geometry. [required]

Options:
  --active-layer <ACTIVE-LAYER>            Set active layer (if not specified, all)
  --active-geometry <ACTIVE-GEOMETRY>      Geometry field name to which to restrict the processing (if not specified, all)
  --endcap-style <ENDCAP-STYLE>            Endcap style.. ENDCAP-STYLE=round|flat|square (default: round)
  --join-style <JOIN-STYLE>                Join style.. JOIN-STYLE=round|mitre|bevel (default: round)
  --mitre-limit <MITRE-LIMIT>              Mitre ratio limit (only affects mitered join style). (default: 5)
  --quadrant-segments <QUADRANT-SEGMENTS>  Number of line segments used to approximate a quarter circle. (default: 8)
  --side <SIDE>                            Sets whether the computed buffer should be single-sided or not.. SIDE=both|left|right (default: both)

Details for options can be found in gdal vector buffer.

concat

* concat [OPTIONS] <INPUT>...
-----------------------------

Concatenate vector datasets.

Positional arguments:
  -i, --input <INPUT>                                        Input vector datasets [1.. values] [required]

Options:
  -l, --layer, --input-layer <INPUT-LAYER>                   Input layer name(s) [may be repeated]
  --mode <MODE>                                              Determine the strategy to create output layers from source layers . MODE=merge-per-layer-name|stack|single (default: merge-per-layer-name)
  --output-layer <OUTPUT-LAYER>                              Name of the output vector layer (single mode), or template to name the output vector layers (stack mode)
  --source-layer-field-name <SOURCE-LAYER-FIELD-NAME>        Name of the new field to add to contain identificoncation of the source layer, with value determined from 'source-layer-field-content'
  --source-layer-field-content <SOURCE-LAYER-FIELD-CONTENT>  A string, possibly using {AUTO_NAME}, {DS_NAME}, {DS_BASENAME}, {DS_INDEX}, {LAYER_NAME}, {LAYER_INDEX}
  --field-strategy <FIELD-STRATEGY>                          How to determine target fields from source fields. FIELD-STRATEGY=union|intersection (default: union)
  -s, --src-crs <SRC-CRS>                                    Source CRS
  -d, --dst-crs <DST-CRS>                                    Destination CRS

Advanced Options:
  --if, --input-format <INPUT-FORMAT>                        Input formats [may be repeated]
  --oo, --open-option <KEY>=<VALUE>                          Open options [may be repeated]

Details for options can be found in gdal vector concat.

clip

* clip [OPTIONS]
----------------

Clip a vector dataset.

Options:
  --active-layer <ACTIVE-LAYER>    Set active layer (if not specified, all)
  --bbox <BBOX>                    Clipping bounding box as xmin,ymin,xmax,ymax
                                   Mutually exclusive with --geometry, --like
  --bbox-crs <BBOX-CRS>            CRS of clipping bounding box
  --geometry <GEOMETRY>            Clipping geometry (WKT or GeoJSON)
                                   Mutually exclusive with --bbox, --like
  --geometry-crs <GEOMETRY-CRS>    CRS of clipping geometry
  --like <DATASET>                 Dataset to use as a template for bounds
                                   Mutually exclusive with --bbox, --geometry
  --like-sql <SELECT-STATEMENT>    SELECT statement to run on the 'like' dataset
                                   Mutually exclusive with --like-where
  --like-layer <LAYER-NAME>        Name of the layer of the 'like' dataset
  --like-where <WHERE-EXPRESSION>  WHERE SQL clause to run on the 'like' dataset
                                   Mutually exclusive with --like-sql

Details for options can be found in gdal vector clip.

edit

* edit [OPTIONS]
----------------

Edit metadata of a vector dataset.

Options:
  --active-layer <ACTIVE-LAYER>    Set active layer (if not specified, all)
  --geometry-type <GEOMETRY-TYPE>  Layer geometry type
  --crs <CRS>                      Override CRS (without reprojection)
  --metadata <KEY>=<VALUE>         Add/update dataset metadata item [may be repeated]
  --unset-metadata <KEY>           Remove dataset metadata item [may be repeated]
  --layer-metadata <KEY>=<VALUE>   Add/update layer metadata item [may be repeated]
  --unset-layer-metadata <KEY>     Remove layer metadata item [may be repeated]
  --unset-fid                      Unset the identifier of each feature and the FID column name

Details for options can be found in gdal vector edit.

explode-collections

* explode-collections [OPTIONS]
-------------------------------

Explode geometries of type collection of a vector dataset.

Options:
  --active-layer <ACTIVE-LAYER>        Set active layer (if not specified, all)
  --active-geometry <ACTIVE-GEOMETRY>  Geometry field name to which to restrict the processing (if not specified, all)
  --geometry-type <GEOMETRY-TYPE>      Geometry type
  --skip-on-type-mismatch              Skip feature when change of feature geometry type failed

Details for options can be found in gdal vector explode-collections.

filter

* filter [OPTIONS]
------------------

Filter a vector dataset.

Options:
  --active-layer <ACTIVE-LAYER>  Set active layer (if not specified, all)
  --bbox <BBOX>                  Bounding box as xmin,ymin,xmax,ymax
  --where <WHERE>|@<filename>    Attribute query in a restricted form of the queries used in the SQL WHERE statement

Details for options can be found in gdal vector filter.

limit

* limit [OPTIONS] <LIMIT>
-------------------------

Truncate a vector dataset to no more than a specified number of features.

Positional arguments:
  --limit <LIMIT>                Limit the number of features to read per layer [required]

Options:
  --active-layer <ACTIVE-LAYER>  Set active layer (if not specified, all)

make-valid

* make-valid [OPTIONS]
----------------------

Fix validity of geometries of a vector dataset.

Options:
  --active-layer <ACTIVE-LAYER>        Set active layer (if not specified, all)
  --active-geometry <ACTIVE-GEOMETRY>  Geometry field name to which to restrict the processing (if not specified, all)
  --method <METHOD>                    Algorithm to use when repairing invalid geometries.. METHOD=linework|structure (default: linework)
  --keep-lower-dim                     Keep components of lower dimension after MakeValid()

Details for options can be found in gdal vector make-valid.

materialize

* materialize [OPTIONS]
-----------------------

Materialize a piped dataset on disk to increase the efficiency of the following steps.

Options:
  -o, --output <OUTPUT>                                Materialized dataset name (created by algorithm)
  -f, --of, --format, --output-format <OUTPUT-FORMAT>  Output format
  --co, --creation-option <KEY>=<VALUE>                Creation option [may be repeated]
  --lco, --layer-creation-option <KEY>=<VALUE>         Layer creation option [may be repeated]
  --overwrite                                          Whether overwriting existing output is allowed

Details for options can be found in gdal vector materialize.

reproject

* reproject [OPTIONS]
---------------------

Reproject a vector dataset.

Options:
  --active-layer <ACTIVE-LAYER>  Set active layer (if not specified, all)
  -s, --src-crs <SRC-CRS>        Source CRS
  -d, --dst-crs <DST-CRS>        Destination CRS [required]

Details for options can be found in gdal vector reproject.

segmentize

* segmentize [OPTIONS] <MAX-LENGTH>
-----------------------------------

Segmentize geometries of a vector dataset.

Positional arguments:
  --max-length <MAX-LENGTH>            Maximum length of a segment [required]

Options:
  --active-layer <ACTIVE-LAYER>        Set active layer (if not specified, all)
  --active-geometry <ACTIVE-GEOMETRY>  Geometry field name to which to restrict the processing (if not specified, all)

Details for options can be found in gdal vector segmentize.

select

* select [OPTIONS] <FIELDS>
---------------------------

Select a subset of fields from a vector dataset.

Positional arguments:
  --fields <FIELDS>              Fields to select (or exclude if --exclude) [may be repeated] [required]

Options:
  --active-layer <ACTIVE-LAYER>  Set active layer (if not specified, all)
  --exclude                      Exclude specified fields
                                 Mutually exclusive with --ignore-missing-fields
  --ignore-missing-fields        Ignore missing fields
                                 Mutually exclusive with --exclude

Details for options can be found in gdal vector select.

set-field-type

* set-field-type [OPTIONS]
--------------------------

Modify the type of a field of a vector dataset.

Options:
  --active-layer <ACTIVE-LAYER>                Set active layer (if not specified, all)
  --field-name <FIELD-NAME>                    Field name [required]
                                               Mutually exclusive with --src-field-type
  --src-field-type <SRC-FIELD-TYPE>            Source field type or subtype [required]
                                               Mutually exclusive with --field-name
  --dst-field-type, --field-type <FIELD-TYPE>  Target field type or subtype [required]

Details for options can be found in gdal vector set-field-type.

set-geom-type

* set-geom-type [OPTIONS]
-------------------------

Modify the geometry type of a vector dataset.

Options:
  --active-layer <ACTIVE-LAYER>        Set active layer (if not specified, all)
  --active-geometry <ACTIVE-GEOMETRY>  Geometry field name to which to restrict the processing (if not specified, all)
  --layer-only                         Only modify the layer geometry type
                                       Mutually exclusive with --feature-only
  --feature-only                       Only modify the geometry type of features
                                       Mutually exclusive with --layer-only
  --geometry-type <GEOMETRY-TYPE>      Geometry type
  --multi                              Force geometries to MULTI geometry types
                                       Mutually exclusive with --single
  --single                             Force geometries to non-MULTI geometry types
                                       Mutually exclusive with --multi
  --linear                             Convert curve geometries to linear types
                                       Mutually exclusive with --curve
  --curve                              Convert linear geometries to curve types
                                       Mutually exclusive with --linear
  --dim <DIM>                          Force geometries to the specified dimension. DIM=XY|XYZ|XYM|XYZM
  --skip                               Skip feature when change of feature geometry type failed

Details for options can be found in gdal vector set-geom-type.

simplify

* simplify [OPTIONS] <TOLERANCE>
--------------------------------

Simplify geometries of a vector dataset.

Positional arguments:
  --tolerance <TOLERANCE>              Distance tolerance for simplification. [required]

Options:
  --active-layer <ACTIVE-LAYER>        Set active layer (if not specified, all)
  --active-geometry <ACTIVE-GEOMETRY>  Geometry field name to which to restrict the processing (if not specified, all)

Details for options can be found in gdal vector simplify.

simplify-coverage

* simplify-coverage [OPTIONS] <TOLERANCE>
-----------------------------------------

Simplify shared boundaries of a polygonal vector dataset.

Positional arguments:
  --tolerance <TOLERANCE>        Distance tolerance for simplification. [required]

Options:
  --active-layer <ACTIVE-LAYER>  Set active layer (if not specified, all)
  --preserve-boundary            Whether the exterior boundary should be preserved.

Details for options can be found in gdal vector simplify-coverage.

sql

* sql [OPTIONS] <statement>|@<filename>
---------------------------------------

Apply SQL statement(s) to a dataset.

Positional arguments:
  --sql <statement>|@<filename>      SQL statement(s) [may be repeated] [required]

Options:
  -l, --output-layer <OUTPUT-LAYER>  Output layer name(s) [may be repeated]
  --dialect <DIALECT>                SQL dialect (e.g. OGRSQL, SQLITE)

Details for options can be found in gdal vector sql.

swap-xy

* swap-xy [OPTIONS]
-------------------

Swap X and Y coordinates of geometries of a vector dataset.

Options:
  --active-layer <ACTIVE-LAYER>        Set active layer (if not specified, all)
  --active-geometry <ACTIVE-GEOMETRY>  Geometry field name to which to restrict the processing (if not specified, all)

Details for options can be found in gdal vector swap-xy.

info

Added in version 3.12.

* info [OPTIONS]
----------------

Return information on a vector dataset.

Options:
  -f, --of, --format, --output-format <OUTPUT-FORMAT>  Output format. OUTPUT-FORMAT=json|text
  -l, --layer, --input-layer <INPUT-LAYER>             Input layer name [may be repeated]
                                                       Mutually exclusive with --sql
  --features                                           List all features (beware of RAM consumption on large layers)
                                                       Mutually exclusive with --summary
  --summary                                            List the layer names and the geometry type
                                                       Mutually exclusive with --features
  --limit <FEATURE-COUNT>                              Limit the number of features per layer (implies --features)
  --sql <statement>|@<filename>                        Execute the indicated SQL statement and return the result
                                                       Mutually exclusive with --input-layer
  --where <WHERE>|@<filename>                          Attribute query in a restricted form of the queries used in the SQL WHERE statement
  --dialect <DIALECT>                                  SQL dialect

Details for options can be found in gdal vector info.

partition

Added in version 3.12.

* partition [OPTIONS] <OUTPUT>
------------------------------

Partition a vector dataset into multiple files.

Positional arguments:
  -o, --output <OUTPUT>                                Output directory [required]

Options:
  --overwrite                                          Whether overwriting existing output is allowed
                                                       Mutually exclusive with --append
  --append                                             Whether appending to existing layer is allowed
                                                       Mutually exclusive with --overwrite
  -f, --of, --format, --output-format <OUTPUT-FORMAT>  Output format
  --co, --creation-option <KEY>=<VALUE>                Creation option [may be repeated]
  --lco, --layer-creation-option <KEY>=<VALUE>         Layer creation option [may be repeated]
  --field <FIELD>                                      Field(s) on which to partition [may be repeated] [required]
  --scheme <SCHEME>                                    Partitioning scheme. SCHEME=hive|flat (default: hive)
  --pattern <PATTERN>                                  Filename pattern ('part_%010d' for scheme=hive, '{LAYER_NAME}_{FIELD_VALUE}_%010d' for scheme=flat)
  --feature-limit <FEATURE-LIMIT>                      Maximum number of features per file
  --max-file-size <MAX-FILE-SIZE>                      Maximum file size (MB or GB suffix can be used)
  --omit-partitioned-field                             Whether to omit partitioned fields from target layer definition
  --skip-errors                                        Skip errors when writing features

Details for options can be found in gdal vector partition.

write

* write [OPTIONS] <OUTPUT>
--------------------------

Write a vector dataset.

Positional arguments:
  -o, --output <OUTPUT>                                Output vector dataset [required]

Options:
  -f, --of, --format, --output-format <OUTPUT-FORMAT>  Output format ("GDALG" allowed)
  --co, --creation-option <KEY>=<VALUE>                Creation option [may be repeated]
  --lco, --layer-creation-option <KEY>=<VALUE>         Layer creation option [may be repeated]
  --overwrite                                          Whether overwriting existing output is allowed
  --update                                             Whether to open existing dataset in update mode
  --overwrite-layer                                    Whether overwriting existing output is allowed
  --append                                             Whether appending to existing layer is allowed
                                                       Mutually exclusive with --upsert
  -l, --output-layer <OUTPUT-LAYER>                    Output layer name
  --skip-errors                                        Skip errors when writing features

Advanced Options:
  --output-oo, --output-open-option <KEY>=<VALUE>      Output open options [may be repeated]
  --upsert                                             Upsert features (implies 'append')
                                                       Mutually exclusive with --append

Gdalg Output (on-the-Fly / Streamed Dataset)

A pipeline can be serialized as a JSON file using the GDALG output format. The resulting file can then be opened as a vector dataset using the GDALG: GDAL Streamed Algorithm driver, and apply the specified pipeline in a on-the-fly / streamed way.

The command_line member of the JSON file should nominally be the whole command line without the final write step, and is what is generated by gdal vector pipeline ! .... ! write out.gdalg.json.

{
    "type": "gdal_streamed_alg",
    "command_line": "gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632"
}

The final write step can be added but if so it must explicitly specify the stream output format and a non-significant output dataset name.

{
    "type": "gdal_streamed_alg",
    "command_line": "gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632 ! write --output-format=streamed streamed_dataset"
}

Substitutions

Added in version 3.12.

It is possible to use gdal pipeline to use a pipeline already serialized in a .gdal.json file, and customize its existing steps, typically changing an input filename, specifying an output filename, or adding/modifying arguments of steps.

See Substitutions.

Nested Pipeline

Added in version 3.12.

It is possible to create "nested pipelines", i.e. pipelines inside pipelines.

A nested pipeline is delimited by square brackets ([ and ]) surrounded by a space character.

There are 2 kinds of nested pipelines:

input nested pipelines: where the result dataset of the nested pipeline is used as the input dataset for an argument of the main pipeline.
output nested pipelines: where the output of a step of the main pipeline is used as the input of the nested pipeline in a following step. Output nested pipelines can only be used with the tee step.

See Nested pipeline.

Examples

Example 1: Reproject a GeoPackage file to CRS EPSG:32632 (“WGS 84 / UTM zone 32N”)

$ gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632 ! write out.gpkg --overwrite

Example 2: Serialize the command of a reprojection of a GeoPackage file in a GDALG file, and later read it

$ gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632 ! write in_epsg_32632.gdalg.json --overwrite
$ gdal vector info in_epsg_32632.gdalg.json

Example 3: None

Union 2 source shapefiles (with similar structure), reproject them to EPSG:32632, keep only cities larger than 1 million inhabitants and write to a GeoPackage

$ gdal vector pipeline ! concat --single --dst-crs=EPSG:32632 france.shp belgium.shp ! filter --where "pop > 1e6" ! write out.gpkg --overwrite

gdal-vector-pipeline - Man Page

Name

Description

Synopsis

Gdalg Output (on-the-Fly / Streamed Dataset)

Substitutions

Nested Pipeline

Examples

Example 1: Reproject a GeoPackage file to CRS EPSG:32632 (“WGS 84 / UTM zone 32N”)

Example 2: Serialize the command of a reprojection of a GeoPackage file in a GDALG file, and later read it

Example 3: None

Author

Copyright

Info