dumppdf - Man Page

dumppdf – extract pdf structure in XML format

Synopsis

dumppdf [-h] [--version] [--debug] [--extract-toc | --extract-embedded EXTRACT_EMBEDDED] [--page-numbers PAGE_NUMBERS [PAGE_NUMBERS ...]] [--pagenos PAGENOS] [--objects OBJECTS] [--all] [--password PASSWORD] [--outfile OUTFILE] [--raw-stream | --binary-stream | --text-stream] files [files ...]

Options

Positional Arguments

files

One or more paths to PDF files.

Optional Arguments

-h--help

Show a help message and exit.

--version-v

Show program’s version number and exit.

--debug-d

Use debug logging level.

--extract-toc-T

Extract structure of outline

--extract-embedded EXTRACT_EMBEDDED-E EXTRACT_EMBEDDED

Extract embedded files

Parser

Used during PDF parsing

--page-numbers PAGE_NUMBERS [PAGE_NUMBERS ...]

A space-seperated list of page numbers to parse.

--pagenos PAGENOS-p PAGENOS

A comma-separated list of page numbers to parse. Included for legacy applications; use --page-numbers for more idiomatic argument entry.

--objects OBJECTS-i OBJECTS

Comma separated list of object numbers to extract

--all-a

If the structure of all objects should be extracted

--password PASSWORD-P PASSWORD

The password to use for decrypting PDF file.

Output

Used during output generation.

--outfile OUTFILE-o OUTFILE

Path to file where output is written. Or “-” (default) to write to stdout.

--raw-stream-r

Write stream objects without encoding

--binary-stream-b

Write stream objects with binary encoding

--text-stream-t

Write stream objects as plain text

See Also

pdf2txt(1)

Referenced By

pdf2txt(1).

October 2021