Your company here ā€” click to reach over 10,000 unique daily visitors

extractpdfmark - Man Page

Extract page mode and named destinations as PDFmark from PDF


extractpdfmark file.pdf > file.ps


This manual page documents briefly the extractpdfmark command.

When you create a PDF document using a TeX system, you may include many small PDF files in the main PDF file. It is common for each of those files to use the same fonts.

If the small PDF files contain embedded font subsets, the TeX system includes them as-is in the main PDF. As a result, several subsets of the same font are embedded in the main PDF. It is not possible to remove the duplicates since the subsets differ. This vastly increases the size of the main PDF file.

On the other hand, if the small PDF files contain embedded full font sets, the TeX system also includes all of them in the main PDF. This time, the main PDF contains duplicates of the same full sets of fonts. Therefore, Ghostscript can remove the duplicates. This may considerably reduce the main PDF-file's size. (Note: Ghostscript 9.17 - 9.21 needs -dPDFDontUseFontObjectNum commandline option for removing duplicate fonts. If you use Ghostscript 9.22+, you cannot use this "full set embedding" method since it cannot remove duplicate fonts. In this case, you can use "*not* embedding" method as following.)

Finally, if the small PDF files contain some fonts that are not embedded, the TeX system outputs the main PDF file with some fonts missing. In this case, Ghostscript can embed the necessary fonts. It can also significantly reduce the required disk size.

Either way, when Ghostscript reads the main PDF produced by the TeX system and outputs the final PDF it does not preserve PDF page-mode and named-destinations, etc. As a result, when you open the final PDF, it is not displayed correctly. Also, remote PDF links will not work.

This program is able to extract the page mode and named destinations as PDFmark from PDF. By using this you can get the small PDF files that have preserved them.


$ extractpdfmark TeX-System-Outputted.pdf > Extracted-PDFmark.ps
$ gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite \
     -dPDFDontUseFontObjectNum -dPrinted=false \
     -sOutputFile=Final.pdf \
     TeX-System-Outputted.pdf Extracted-PDFmark.ps

(Note: Ghostscript 9.26+ needs -dPrinted=false commandline option.)


January 26, 2019