pdfio - Man Page

pdf read/write library

Introduction

PDFio is a simple C library for reading and writing PDF files. The primary goals of PDFio are:

·

Read and write any version of PDF file

·

Provide access to pages, objects, and streams within a PDF file

·

Support reading and writing of encrypted PDF files

·

Extract or embed useful metadata (author, creator, page information, etc.)

·

"Filter" PDF files, for example to extract a range of pages or to embed fonts that are missing from a PDF

·

Provide access to objects used for each page

PDFio is not
concerned with rendering or viewing a PDF file, although a PDF RIP or viewer could be written using it.

PDFio is Copyright © 2021-2025 by Michael R Sweet and is licensed under the Apache License Version 2.0 with an (optional) exception to allow linking against GPL2/LGPL2 software. See the files "LICENSE" and "NOTICE" for more information.

Requirements

PDFio requires the following to build the software:

·

A C99 compiler such as Clang, GCC, or MS Visual C

·

A POSIX-compliant make program

·

A POSIX-compliant sh program

·

ZLIB (https://www.zlib.net/) 1.0 or higher

PDFio will also use libpng 1.6 or higher (https://www.libpng.org/) to provide enhanced PNG image support.

IDE files for Xcode (macOS/iOS) and Visual Studio (Windows) are also provided.

Installing PDFio

PDFio comes with a configure script that creates a portable makefile that will work on any POSIX-compliant system with ZLIB installed. To make it, run:

    ./configure
    make

To test it, run:

    make test

To install it, run:

    sudo make install

If you want a shared library, run:

    ./configure --enable-shared
    make
    sudo make install

The default installation location is "/usr/local". Pass the --prefix option to make to install it to another location:

    ./configure --prefix=/some/other/directory

Other configure options can be found using the --help option:

    ./configure --help

Visual Studio Project

The Visual Studio solution ("pdfio.sln") is provided for Windows developers and generates both a static library and DLL.

Xcode Project

There is also an Xcode project ("pdfio.xcodeproj") you can use on macOS which generates a static library that will be installed under "/usr/local" with:

    sudo xcodebuild install

Detecting PDFio

PDFio can be detected using the pkg-config command, for example:

    if pkg-config --exists pdfio; then
        ...
    fi

In a makefile you can add the necessary compiler and linker options with:

    CFLAGS  +=      `pkg-config --cflags pdfio`
    LIBS    +=      `pkg-config --libs pdfio`

On Windows, you need to link to the PDFIO1.LIB (DLL) library and include the zlib_native NuGet package dependency. You can also use the published pdfio_native NuGet package.

Header Files

PDFio provides a primary header file that is always used:

    #include <pdfio.h>

PDFio also provides PDF content helper functions for producing PDF content that are defined in a separate header file:

    #include <pdfio-content.h>

Understanding PDF Files

A PDF file provides data and commands for displaying pages of graphics and text, and is structured in a way that allows it to be displayed in the same way across multiple devices and platforms. The following is a PDF which shows "Hello, World!" on one page:

    %PDF-1.0                        % Header starts here
    %âãÏÓ
    1 0 obj                         % Body starts here
    <<
    /Kids [2 0 R]
    /Count 1
    /Type /Pages
    >>
    endobj
    2 0 obj
    <<
    /Rotate 0
    /Parent 1 0 R
    /Resources 3 0 R
    /MediaBox [0 0 612 792]
    /Contents [4 0 R]/Type /Page
    >>
    endobj
    3 0 obj
    <<
    /Font
    <<
    /F0
    <<
    /BaseFont /Times-Italic
    /Subtype /Type1
    /Type /Font
    >>
    >>
    >>
    endobj
    4 0 obj
    <<
    /Length 65
    >>
    stream
    1. 0. 0. 1. 50. 700. cm
    BT
      /F0 36. Tf
      (Hello, World!) Tj
    ET
    endstream
    endobj
    5 0 obj
    <<
    /Pages 1 0 R
    /Type /Catalog
    >>
    endobj
    xref                            % Cross-reference table starts here
    0 6
    0000000000 65535 f
    0000000015 00000 n
    0000000074 00000 n
    0000000192 00000 n
    0000000291 00000 n
    0000000409 00000 n
    trailer                         % Trailer starts here
    <<
    /Root 5 0 R
    /Size 6
    >>
    startxref
    459
    %%EOF

Header

The header is the first line of a PDF file that specifies the version of the PDF format that has been used, for example %PDF-1.0.

Since PDF files almost always contain binary data, they can become corrupted if line endings are changed. For example, if the file is transferred using FTP in text mode or is edited in Notepad on Windows. To allow legacy file transfer programs to determine that the file is binary, the PDF standard recommends including some bytes with character codes higher than 127 in the header, for example:

    %âãÏÓ

The percent sign indicates a comment line while the other few bytes are arbitrary character codes in excess of 127. So, the whole header in our example is:

    %PDF-1.0
    %âãÏÓ

Body

The file body consists of a sequence of objects, each preceded by an object number, generation number, and the obj keyword on one line, and followed by the endobj keyword on another. For example:

    1 0 obj
    <<
    /Kids [2 0 R]
    /Count 1
    /Type /Pages
    >>
    endobj

In this example, the object number is 1 and the generation number is 0, meaning it is the first version of the object. The content for object 1 is between the initial 1 0 obj and trailing endobj lines. In this case, the content is the dictionary <</Kids [2 0 R] /Count 1 /Type /Pages>>.

Cross-Reference Table

The cross-reference table lists the byte offset of each object in the file body. This allows random access to objects, meaning they don't have to be read in order. Objects that are not used are never read, making the process efficient. Operations like counting the number of pages in a PDF document are fast, even in large files.

Each object has an object number and a generation number. Generation numbers are used when a cross-reference table entry is reused. For simplicity, we will assume generation numbers to be always zero and ignore them. The cross-reference table consists of a header line that indicates the number of entries, a free entry line for object 0, and a line for each of the objects in the file body. For example:

    0 6                             % Six entries in table, starting at 0
    0000000000 65535 f              % Free entry for object 0
    0000000015 00000 n              % Object 1 is at byte offset 15
    0000000074 00000 n              % Object 2 is at byte offset 74
    0000000192 00000 n              % etc...
    0000000291 00000 n
    0000000409 00000 n              % Object 5 is at byte offset 409

Trailer

The first line of the trailer is just the trailer keyword. This is followed by the trailer dictionary which contains at least the /Size entry specifying the number of entries in the cross-reference table and the /Root entry which references the object for the document catalog which is the root element of the graph of objects in the body.

There follows a line with just the startxref keyword, a line with a single number specifying the byte offset of the start of the cross-reference table within the file, and then the line %%EOF which signals the end of the PDF file.

    trailer                         % Trailer keyword
    <<                              % The trailer dictinonary
    /Root 5 0 R
    /Size 6
    >>
    startxref                       % startxref keyword
    459                             % Byte offset of cross-reference table
    %%EOF                           % End-of-file marker

API Overview

PDFio exposes several types:

·

pdfio_file_t: A PDF file (for reading or writing)

·

pdfio_array_t: An array of values

·

pdfio_dict_t: A dictionary of key/value pairs in a PDF file, object, etc.

·

pdfio_obj_t: An object in a PDF file

·

pdfio_stream_t: An object stream

Reading PDF Files

You open an existing PDF file using the pdfioFileOpen function:

    pdfio_file_t *pdf =
        pdfioFileOpen("myinputfile.pdf", password_cb, password_data, error_cb,
                      error_data);

where the five arguments to the function are the filename ("myinputfile.pdf"), an optional password callback function (password_cb) and data pointer value (password_data), and an optional error callback function (error_cb) and data pointer value (error_data). The password callback is called for encrypted PDF files that are not using the default password, for example:

    const char *
    password_cb(void *data, const char *filename)
    {
      (void)data;     // This callback doesn't use the data pointer
      (void)filename; // This callback doesn't use the filename
    
      // Return a password string for the file...
      return ("Password42");
    }

The error callback is called for both errors and warnings and accepts the pdfio_file_t pointer, a message string, and the callback pointer value. It returns true to continue processing the file or false to stop, for example:

    bool
    error_cb(pdfio_file_t *pdf, const char *message, void *data)
    {
      (void)data; // This callback does not use the data pointer
    
      fprintf(stderr, "%s: %s\n", pdfioFileGetName(pdf), message);
    
      // Return true for warning messages (continue) and false for errors (stop)
      return (!strncmp(message, "WARNING:", 8));
    }

The default error callback (NULL) does the equivalent of the above.

Note: Many errors are unrecoverable, so PDFio ignores the return value from the error callback and always stops processing the PDF file. Warning messages start with the prefix "WARNING:" while errors have no prefix.

Each PDF file contains one or more pages. The pdfioFileGetNumPages function returns the number of pages in the file while the pdfioFileGetPage function gets the specified page in the PDF file:

    pdfio_file_t *pdf;   // PDF file
    size_t       i;      // Looping var
    size_t       count;  // Number of pages
    pdfio_obj_t  *page;  // Current page
    
    // Iterate the pages in the PDF file
    for (i = 0, count = pdfioFileGetNumPages(pdf); i < count; i ++)
    {
      page = pdfioFileGetPage(pdf, i);
      // do something with page
    }

Each page is represented by a "page tree" object (what pdfioFileGetPage returns) that specifies information about the page and one or more "content" objects that contain the images, fonts, text, and graphics that appear on the page. Use the pdfioPageGetNumStreams and pdfioPageOpenStream functions to access the content streams for each page, and pdfioObjGetDict to get the associated page object dictionary. For example, if you want to display the media and crop boxes for a given page:

    pdfio_file_t  *pdf;             // PDF file
    size_t        i;                // Looping var
    size_t        count;            // Number of pages
    pdfio_obj_t   *page;            // Current page
    pdfio_dict_t  *dict;            // Current page dictionary
    pdfio_array_t *media_box;       // MediaBox array
    double        media_values[4];  // MediaBox values
    pdfio_array_t *crop_box;        // CropBox array
    double        crop_values[4];   // CropBox values
    
    // Iterate the pages in the PDF file
    for (i = 0, count = pdfioFileGetNumPages(pdf); i < count; i ++)
    {
      page = pdfioFileGetPage(pdf, i);
      dict = pdfioObjGetDict(page);
    
      media_box       = pdfioDictGetArray(dict, "MediaBox");
      media_values[0] = pdfioArrayGetNumber(media_box, 0);
      media_values[1] = pdfioArrayGetNumber(media_box, 1);
      media_values[2] = pdfioArrayGetNumber(media_box, 2);
      media_values[3] = pdfioArrayGetNumber(media_box, 3);
    
      crop_box       = pdfioDictGetArray(dict, "CropBox");
      crop_values[0] = pdfioArrayGetNumber(crop_box, 0);
      crop_values[1] = pdfioArrayGetNumber(crop_box, 1);
      crop_values[2] = pdfioArrayGetNumber(crop_box, 2);
      crop_values[3] = pdfioArrayGetNumber(crop_box, 3);
    
      printf("Page %u: MediaBox=[%g %g %g %g], CropBox=[%g %g %g %g]\n",
             (unsigned)(i + 1),
             media_values[0], media_values[1], media_values[2], media_values[3],
             crop_values[0], crop_values[1], crop_values[2], crop_values[3]);
    }

Page object dictionaries have several (mostly optional) key/value pairs, including:

·

"Annots": An array of annotation dictionaries for the page; use pdfioDictGetArray to get the array

·

"CropBox": The crop box as an array of four numbers for the left, bottom, right, and top coordinates of the target media; use pdfioDictGetArray to get a pointer to the array of numbers

·

"Dur": The number of seconds the page should be displayed; use pdfioDictGetNumber to get the page duration value

·

"Group": The dictionary of transparency group values for the page; use pdfioDictGetDict to get a pointer to the resources dictionary

·

"LastModified": The date and time when this page was last modified; use pdfioDictGetDate to get the Unix time_t value

·

"Parent": The parent page tree node object for this page; use pdfioDictGetObj to get a pointer to the object

·

"MediaBox": The media box as an array of four numbers for the left, bottom, right, and top coordinates of the target media; use pdfioDictGetArray to get a pointer to the array of numbers

·

"Resources": The dictionary of resources for the page; use pdfioDictGetDict to get a pointer to the resources dictionary

·

"Rotate": A number indicating the number of degrees of counter-clockwise rotation to apply to the page when viewing; use pdfioDictGetNumber to get the rotation angle

·

"Thumb": A thumbnail image object for the page; use pdfioDictGetObj to get a pointer to the thumbnail image object

·

"Trans": The page transition dictionary; use pdfioDictGetDict to get a pointer to the dictionary

The pdfioFileClose function closes a PDF file and frees all memory that was used for it:

    pdfioFileClose(pdf);

Writing PDF Files

You create a new PDF file using the pdfioFileCreate function:

    pdfio_rect_t media_box = { 0.0, 0.0, 612.0, 792.0 };  // US Letter
    pdfio_rect_t crop_box = { 36.0, 36.0, 576.0, 756.0 }; // w/0.5" margins
    
    pdfio_file_t *pdf = pdfioFileCreate("myoutputfile.pdf", "2.0", &media_box, &crop_box,
                                        error_cb, error_data);

where the six arguments to the function are the filename ("myoutputfile.pdf"), PDF version ("2.0"), media box (media_box), crop box (crop_box), an optional error callback function (error_cb), and an optional pointer value for the error callback function (error_data). The units for the media and crop boxes are points (1/72nd of an inch).

Alternately you can stream a PDF file using the pdfioFileCreateOutput function:

    pdfio_rect_t media_box = { 0.0, 0.0, 612.0, 792.0 };  // US Letter
    pdfio_rect_t crop_box = { 36.0, 36.0, 576.0, 756.0 }; // w/0.5" margins
    
    pdfio_file_t *pdf = pdfioFileCreateOutput(output_cb, output_ctx, "2.0", &media_box,
                                              &crop_box, error_cb, error_data);

Once the file is created, use the pdfioFileCreateObj, pdfioFileCreatePage, and pdfioPageCopy functions to create objects and pages in the file.

Finally, the pdfioFileClose function writes the PDF cross-reference and "trailer" information, closes the file, and frees all memory that was used for it.

PDF Objects

PDF objects are identified using two numbers - the object number (1 to N) and the object generation (0 to 65535) that specifies a particular version of an object. An object's numbers are returned by the pdfioObjGetNumber and pdfioObjGetGeneration functions. You can find a numbered object using the pdfioFileFindObj function.

Objects contain values (typically dictionaries) and usually an associated data stream containing images, fonts, ICC profiles, and page content. PDFio provides several accessor functions to get the value(s) associated with an object:

·

pdfioObjGetArray returns an object's array value, if any

·

pdfioObjGetDict returns an object's dictionary value, if any

·

pdfioObjGetLength returns the length of the data stream, if any

·

pdfioObjGetSubtype returns the sub-type name of the object, for example "Image" for an image object.

·

pdfioObjGetType returns the type name of the object, for example "XObject" for an image object.

PDF Streams

Some PDF objects have an associated data stream, such as for pages, images, ICC color profiles, and fonts. You access the stream for an existing object using the pdfioObjOpenStream function:

    pdfio_file_t *pdf = pdfioFileOpen(...);
    pdfio_obj_t *obj = pdfioFileFindObj(pdf, number);
    pdfio_stream_t *st = pdfioObjOpenStream(obj, true);

The first argument is the object pointer. The second argument is a boolean value that specifies whether you want to decode (typically decompress) the stream data or return it as-is.

When reading a page stream you'll use the pdfioPageOpenStream function instead:

    pdfio_file_t *pdf = pdfioFileOpen(...);
    pdfio_obj_t *obj = pdfioFileGetPage(pdf, number);
    pdfio_stream_t *st = pdfioPageOpenStream(obj, 0, true);

Once you have the stream open, you can use one of several functions to read from it:

·

pdfioStreamConsume reads and discards a number of bytes in the stream

·

pdfioStreamGetToken reads a PDF token from the stream

·

pdfioStreamPeek peeks at the next stream data without advancing or "consuming" it

·

pdfioStreamRead reads a buffer of data

When you are done reading from the stream, call the pdfioStreamClose function:

    pdfioStreamClose(st);

To create a stream for a new object, call the pdfioObjCreateStream function:

    pdfio_file_t *pdf = pdfioFileCreate(...);
    pdfio_obj_t *obj = pdfioFileCreateObj(pdf, ...);
    pdfio_stream_t *st = pdfioObjCreateStream(obj, PDFIO_FILTER_FLATE);

The first argument is the newly created object. The second argument is either PDFIO_FILTER_NONE to specify that any encoding is done by your program or PDFIO_FILTER_FLATE to specify that PDFio should Flate compress the stream.

To create a page content stream call the pdfioFileCreatePage function:

    pdfio_file_t *pdf = pdfioFileCreate(...);
    pdfio_dict_t *dict = pdfioDictCreate(pdf);
    ... set page dictionary keys and values ...
    pdfio_stream_t *st = pdfioFileCreatePage(pdf, dict);

Once you have created the stream, use any of the following functions to write to the stream:

·

pdfioStreamPrintf writes a formatted string to the stream

·

pdfioStreamPutChar writes a single character to the stream

·

pdfioStreamPuts writes a C string to the stream

·

pdfioStreamWrite writes a buffer of data to the stream

The PDF content helper functions provide additional functions for writing specific PDF page stream commands.

When you are done writing the stream, call pdfioStreamClose to close both the stream and the object.

PDF Content Helper Functions

PDFio includes many helper functions for embedding or writing specific kinds of content to a PDF file. These functions can be roughly grouped into five categories:

·

Color Space Functions

·

Font Object Functions

·

Image Object Functions

·

Page Stream Functions

·

Page Dictionary Functions

Color Space Functions

PDF color spaces are specified using well-known names like "DeviceCMYK", "DeviceGray", and "DeviceRGB" or using arrays that define so-called calibrated color spaces. PDFio provides several functions for embedding ICC profiles and creating color space arrays:

·

pdfioArrayCreateColorFromICCObj creates a color array for an ICC color profile object

·

pdfioArrayCreateColorFromMatrix creates a color array using a CIE XYZ color transform matrix, a gamma value, and a CIE XYZ white point

·

pdfioArrayCreateColorFromPalette creates an indexed color array from an array of sRGB values

·

pdfioArrayCreateColorFromPrimaries creates a color array using CIE XYZ primaries and a gamma value

·

pdfioArrayCreateColorFromStandard creates a color array for a standard color space

You can embed an ICC color profile using the pdfioFileCreateICCObjFromFile function:

    pdfio_file_t *pdf = pdfioFileCreate(...);
    pdfio_obj_t *icc = pdfioFileCreateICCObjFromFile(pdf, "filename.icc");

where the first argument is the PDF file and the second argument is the filename of the ICC color profile.

PDFio also includes predefined constants for creating a few standard color spaces:

    pdfio_file_t *pdf = pdfioFileCreate(...);
    
    // Create an AdobeRGB color array
    pdfio_array_t *adobe_rgb =
        pdfioArrayCreateColorFromStandard(pdf, 3, PDFIO_CS_ADOBE);
    
    // Create an Display P3 color array
    pdfio_array_t *display_p3 =
        pdfioArrayCreateColorFromStandard(pdf, 3, PDFIO_CS_P3_D65);
    
    // Create an sRGB color array
    pdfio_array_t *srgb =
        pdfioArrayCreateColorFromStandard(pdf, 3, PDFIO_CS_SRGB);

Font Object Functions

PDF supports many kinds of fonts, including PostScript Type1, PDF Type3, TrueType/OpenType, and CID. PDFio provides two functions for creating font objects. The first is pdfioFileCreateFontObjFromBase which creates a font object for one of the base PDF fonts:

·

"Courier"

·

"Courier-Bold"

·

"Courier-BoldItalic"

·

"Courier-Italic"

·

"Helvetica"

·

"Helvetica-Bold"

·

"Helvetica-BoldOblique"

·

"Helvetica-Oblique"

·

"Symbol"

·

"Times-Bold"

·

"Times-BoldItalic"

·

"Times-Italic"

·

"Times-Roman"

·

"ZapfDingbats"

Except for Symbol and ZapfDingbats (which use a custom 8-bit character set), PDFio always uses the Windows CP1252 subset of Unicode for these fonts.

The second function is pdfioFileCreateFontObjFromFile which creates a font object from a TrueType/OpenType font file, for example:

    pdfio_file_t *pdf = pdfioFileCreate(...);
    pdfio_obj_t *arial =
        pdfioFileCreateFontObjFromFile(pdf, "OpenSans-Regular.ttf", false);

will embed an OpenSans Regular TrueType font using the Windows CP1252 subset of Unicode. Pass true for the third argument to embed it as a Unicode CID font instead, for example:

    pdfio_file_t *pdf = pdfioFileCreate(...);
    pdfio_obj_t *arial =
        pdfioFileCreateFontObjFromFile(pdf, "NotoSansJP-Regular.otf", true);

will embed the NotoSansJP Regular OpenType font with full support for Unicode.

Note: Not all fonts support Unicode, and most do not contain a full complement of Unicode characters. pdfioFileCreateFontObjFromFile does not perform any character subsetting, so the entire font file is embedded in the PDF file.

Image Object Functions

PDF supports images with many different color spaces and bit depths with optional transparency. PDFio provides two helper functions for creating image objects that can be referenced in page streams. The first function is pdfioFileCreateImageObjFromData which creates an image object from data in memory, for example:

    pdfio_file_t *pdf = pdfioFileCreate(...);
    unsigned char data[1024 * 1024 * 4]; // 1024x1024 RGBA image data
    pdfio_obj_t *img =
        pdfioFileCreateImageObjFromData(pdf, data, /*width*/1024, /*height*/1024,
                                        /*num_colors*/3, /*color_data*/NULL,
                                        /*alpha*/true, /*interpolate*/false);

will create an object for a 1024x1024 RGBA image in memory, using the default color space for 3 colors ("DeviceRGB"). We can use one of the color space functions to use a specific color space for this image, for example:

    pdfio_file_t *pdf = pdfioFileCreate(...);
    
    // Create an AdobeRGB color array
    pdfio_array_t *adobe_rgb =
        pdfioArrayCreateColorFromMatrix(pdf, 3, pdfioAdobeRGBGamma,
                                        pdfioAdobeRGBMatrix, pdfioAdobeRGBWhitePoint);
    
    // Create a 1024x1024 RGBA image using AdobeRGB
    unsigned char data[1024 * 1024 * 4]; // 1024x1024 RGBA image data
    pdfio_obj_t *img =
        pdfioFileCreateImageObjFromData(pdf, data, /*width*/1024, /*height*/1024,
                                        /*num_colors*/3, /*color_data*/adobe_rgb,
                                        /*alpha*/true, /*interpolate*/false);

The "interpolate" argument specifies whether the colors in the image should be smoothed/interpolated when scaling. This is most useful for photographs but should be false for screenshot and barcode images.

If you have a JPEG or PNG file, use the pdfioFileCreateImageObjFromFile function to copy the image into a PDF image object, for example:

    pdfio_file_t *pdf = pdfioFileCreate(...);
    pdfio_obj_t *img =
        pdfioFileCreateImageObjFromFile(pdf, "myphoto.jpg", /*interpolate*/true);

Note: Currently pdfioFileCreateImageObjFromFile does not support 12 bit JPEG files or PNG files with an alpha channel.

Page Dictionary Functions

PDF pages each have an associated dictionary to specify the images, fonts, and color spaces used by the page. PDFio provides functions to add these resources to the dictionary:

·

pdfioPageDictAddColorSpace adds a named color space to the page dictionary

·

pdfioPageDictAddFont adds a named font to the page dictionary

·

pdfioPageDictAddImage adds a named image to the page dictionary

Page Stream Functions

PDF page streams contain textual commands for drawing on the page. PDFio provides many functions for writing these commands with the correct format and escaping, as needed:

·

pdfioContentClip clips future drawing to the current path

·

pdfioContentDrawImage draws an image object

·

pdfioContentFill fills the current path

·

pdfioContentFillAndStroke fills and strokes the current path

·

pdfioContentMatrixConcat concatenates a matrix with the current transform matrix

·

pdfioContentMatrixRotate concatenates a rotation matrix with the current transform matrix

·

pdfioContentMatrixScale concatenates a scaling matrix with the current transform matrix

·

pdfioContentMatrixTranslate concatenates a translation matrix with the current transform matrix

·

pdfioContentPathClose closes the current path

·

pdfioContentPathCurve appends a Bezier curve to the current path

·

pdfioContentPathCurve13 appends a Bezier curve with 2 control points to the current path

·

pdfioContentPathCurve23 appends a Bezier curve with 2 control points to the current path

·

pdfioContentPathLineTo appends a line to the current path

·

pdfioContentPathMoveTo moves the current point in the current path

·

pdfioContentPathRect appends a rectangle to the current path

·

pdfioContentRestore restores a previous graphics state

·

pdfioContentSave saves the current graphics state

·

pdfioContentSetDashPattern sets the line dash pattern

·

pdfioContentSetFillColorDeviceCMYK sets the current fill color using a device CMYK color

·

pdfioContentSetFillColorDeviceGray sets the current fill color using a device gray color

·

pdfioContentSetFillColorDeviceRGB sets the current fill color using a device RGB color

·

pdfioContentSetFillColorGray sets the current fill color using a calibrated gray color

·

pdfioContentSetFillColorRGB sets the current fill color using a calibrated RGB color

·

pdfioContentSetFillColorSpace sets the current fill color space

·

pdfioContentSetFlatness sets the flatness for curves

·

pdfioContentSetLineCap sets how the ends of lines are stroked

·

pdfioContentSetLineJoin sets how connections between lines are stroked

·

pdfioContentSetLineWidth sets the width of stroked lines

·

pdfioContentSetMiterLimit sets the miter limit for stroked lines

·

pdfioContentSetStrokeColorDeviceCMYK sets the current stroke color using a device CMYK color

·

pdfioContentSetStrokeColorDeviceGray sets the current stroke color using a device gray color

·

pdfioContentSetStrokeColorDeviceRGB sets the current stroke color using a device RGB color

·

pdfioContentSetStrokeColorGray sets the current stroke color using a calibrated gray color

·

pdfioContentSetStrokeColorRGB sets the current stroke color using a calibrated RGB color

·

pdfioContentSetStrokeColorSpace sets the current stroke color space

·

pdfioContentSetTextCharacterSpacing sets the spacing between characters for text

·

pdfioContentSetTextFont sets the font and size for text

·

pdfioContentSetTextLeading sets the line height for text

·

pdfioContentSetTextMatrix concatenates a matrix with the current text matrix

·

pdfioContentSetTextRenderingMode sets the text rendering mode

·

pdfioContentSetTextRise adjusts the baseline for text

·

pdfioContentSetTextWordSpacing sets the spacing between words for text

·

pdfioContentSetTextXScaling sets the horizontal scaling for text

·

pdfioContentStroke strokes the current path

·

pdfioContentTextBegin begins a block of text

·

pdfioContentTextEnd ends a block of text

·

pdfioContentTextMoveLine moves to the next line with an offset in a text block

·

pdfioContentTextMoveTo moves within the current line in a text block

·

pdfioContentTextNewLine moves to the beginning of the next line in a text block

·

pdfioContentTextNewLineShow moves to the beginning of the next line in a text block and shows literal text with optional word and character spacing

·

pdfioContentTextNewLineShowf moves to the beginning of the next line in a text block and shows formatted text with optional word and character spacing

·

pdfioContentTextShow draws a literal string in a text block

·

pdfioContentTextShowf draws a formatted string in a text block

·

pdfioContentTextShowJustified draws an array of literal strings with offsets between them

Tagged and Marked PDF Content

Content in a page stream can be tagged to help a PDF reader application know the kind and organization of that content. Content inserted using the PDFio Page Stream Functions can be tagged by surrounding it with the pdfioContentBeginMarked and pdfioContentEndMarked functions.

The pdfioContentBeginMarked function accepts a named tag and optional dictionary of attributes such as the marked content identifier ("MCID"). For example, the following code tags a paragraph of text:

    pdfio_file_t   *pdf;  // PDF file
    pdfio_stream_t *st;   // Page stream
    
    pdfioContentBeginMarked(st, "P", /*dict*/NULL);
    
    pdfioContentTextShow(st, /*unicode*/false, "Mary had a little lamb\n");
    pdfioContentTextShow(st, /*unicode*/false, "whose fleece was white as snow.\n");
    pdfioContentTextShow(st, /*unicode*/false, "And everywhere that Mary went\n");
    pdfioContentTextShow(st, /*unicode*/false, "the lamb was sure to go,\n");
    
    pdfioContentEndMarked(st);

To mark the same paragraph with a content identifier you would first create a dictionary containing the "MCID" key/value pair and then mark the paragraph with that dictionary:

    pdfio_file_t   *pdf;  // PDF file
    pdfio_stream_t *st;   // Page stream
    pdfio_dict_t   *dict; // Content dictionary
    
    dict = pdfioDictCreate(pdf);
    pdfioDictSetNumber(dict, "MCID", 42);
    
    pdfioContentBeginMarked(st, "P", dict);
    
    pdfioContentTextShow(st, /*unicode*/false, "Mary had a little lamb\n");
    pdfioContentTextShow(st, /*unicode*/false, "whose fleece was white as snow.\n");
    pdfioContentTextShow(st, /*unicode*/false, "And everywhere that Mary went\n");
    pdfioContentTextShow(st, /*unicode*/false, "the lamb was sure to go,\n");
    
    pdfioContentEndMarked(st);

Examples

PDFio includes several example programs that are typically installed to the /usr/share/doc/pdfio/examples or /usr/local/share/doc/pdfio/examples directories. A makefile is included to build them.

Read PDF Metadata

The pdfioinfo.c example program opens a PDF file and prints the title, author, creation date, and number of pages:

    #include <pdfio.h>
    #include <time.h>
    
    
    int                                     // O - Exit status
    main(int  argc,                         // I - Number of command-line arguments
         char *argv[])                      // Command-line arguments
    {
      const char    *filename;              // PDF filename
      pdfio_file_t  *pdf;                   // PDF file
      pdfio_dict_t  *catalog;               // Catalog dictionary
      const char    *author,                // Author name
                    *creator,               // Creator name
                    *producer,              // Producer name
                    *title;                 // Title
      time_t        creation_date,          // Creation date
                    modification_date;      // Modification date
      struct tm     *creation_tm,           // Creation date/time information
                    *modification_tm;       // Modification date/time information
      char          creation_text[256],     // Creation date/time as a string
                    modification_text[256], // Modification date/time human fmt string
                    range_text[255];        // Page range text
      size_t        num_pages;              // PDF number of pages
      bool          has_acroform;           // Does the file have an AcroForm?
      pdfio_obj_t   *page;                  // Object
      pdfio_dict_t  *page_dict;             // Object dictionary
      size_t        cur,                    // Current page index
                    prev;                   // Previous page index
      pdfio_rect_t  cur_box,                // Current MediaBox
                    prev_box;               // Previous MediaBox
    
    
      // Get the filename from the command-line...
      if (argc != 2)
      {
        fputs("Usage: ./pdfioinfo FILENAME.pdf\n", stderr);
        return (1);
      }
    
      filename = argv[1];
    
      // Open the PDF file with the default callbacks...
      pdf = pdfioFileOpen(filename, /*password_cb*/NULL,
                          /*password_cbdata*/NULL, /*error_cb*/NULL,
                          /*error_cbdata*/NULL);
      if (pdf == NULL)
        return (1);
    
      // Get the title, author, etc...
      catalog      = pdfioFileGetCatalog(pdf);
      author       = pdfioFileGetAuthor(pdf);
      creator      = pdfioFileGetCreator(pdf);
      has_acroform = pdfioDictGetType(catalog, "AcroForm") != PDFIO_VALTYPE_NONE;
      num_pages    = pdfioFileGetNumPages(pdf);
      producer     = pdfioFileGetProducer(pdf);
      title        = pdfioFileGetTitle(pdf);
    
      // Get the creation date and convert to a string...
      if ((creation_date = pdfioFileGetCreationDate(pdf)) > 0)
      {
        creation_tm   = localtime(&creation_date);
        strftime(creation_text, sizeof(creation_text), "%c", creation_tm);
      }
      else
      {
        snprintf(creation_text, sizeof(creation_text), "-- not set --");
      }
    
      // Get the modification date and convert to a string...
      if ((modification_date = pdfioFileGetModificationDate(pdf)) > 0)
      {
        modification_tm = localtime(&modification_date);
        strftime(modification_text, sizeof(modification_text), "%c", modification_tm);
      }
      else
      {
        snprintf(modification_text, sizeof(modification_text), "-- not set --");
      }
    
      // Print file information to stdout...
      printf("%s:\n", filename);
      printf("           Title: %s\n", title ? title : "-- not set --");
      printf("          Author: %s\n", author ? author : "-- not set --");
      printf("         Creator: %s\n", creator ? creator : "-- not set --");
      printf("        Producer: %s\n", producer ? producer : "-- not set --");
      printf("      Created On: %s\n", creation_text);
      printf("     Modified On: %s\n", modification_text);
      printf("         Version: %s\n", pdfioFileGetVersion(pdf));
      printf("        AcroForm: %s\n", has_acroform ? "Yes" : "No");
      printf(" Number of Pages: %u\n", (unsigned)num_pages);
    
      // Report the MediaBox for all of the pages
      prev_box.x1 = prev_box.x2 = prev_box.y1 = prev_box.y2 = 0.0;
    
      for (cur = 0, prev = 0; cur < num_pages; cur ++)
      {
        // Find the MediaBox for this page in the page tree...
        for (page = pdfioFileGetPage(pdf, cur);
             page != NULL;
             page = pdfioDictGetObj(page_dict, "Parent"))
        {
          cur_box.x1 = cur_box.x2 = cur_box.y1 = cur_box.y2 = 0.0;
          page_dict  = pdfioObjGetDict(page);
    
          if (pdfioDictGetRect(page_dict, "MediaBox", &cur_box))
            break;
        }
    
        // If this MediaBox is different from the previous one, show the range of
        // pages that have that size...
        if (cur == 0 ||
            fabs(cur_box.x1 - prev_box.x1) > 0.01 ||
            fabs(cur_box.y1 - prev_box.y1) > 0.01 ||
            fabs(cur_box.x2 - prev_box.x2) > 0.01 ||
            fabs(cur_box.y2 - prev_box.y2) > 0.01)
        {
          if (cur > prev)
          {
            snprintf(range_text, sizeof(range_text), "Pages %u-%u",
                     (unsigned)(prev + 1), (unsigned)cur);
            printf("%16s: [%g %g %g %g]\n", range_text,
                   prev_box.x1, prev_box.y1, prev_box.x2, prev_box.y2);
          }
    
          // Start a new series of pages with the new size...
          prev     = cur;
          prev_box = cur_box;
        }
      }
    
      // Show the last range as needed...
      if (cur > prev)
      {
        snprintf(range_text, sizeof(range_text), "Pages %u-%u",
                 (unsigned)(prev + 1), (unsigned)cur);
        printf("%16s: [%g %g %g %g]\n", range_text,
               prev_box.x1, prev_box.y1, prev_box.x2, prev_box.y2);
      }
    
      // Close the PDF file...
      pdfioFileClose(pdf);
    
      return (0);
    }

Extract Text from PDF File

The pdf2text.c example code extracts text from a PDF file and writes it to the standard output. Unlike some other PDF tools, it outputs the text in the order it is seen in each page stream so the output might appear "jumbled" if the PDF producer doesn't output text in reading order. The code is able to handle different font encodings and produces UTF-8 output.

The pdfioStreamGetToken function is used to read individual tokens from the page streams:

    pdfio_stream_t *st;              // Page stream
    char           buffer[1024],     // Token buffer
                   *bufptr,          // Pointer into buffer
                   name[256];        // Current (font) name
    bool           first = true;     // First string on line?
    int            encoding[256];    // Font encoding to Unicode
    bool           in_array = false; // Are we in an array?
    
    // Read PDF tokens from the page stream...
    while (pdfioStreamGetToken(st, buffer, sizeof(buffer)))
    {

Justified text can be found inside arrays ("[ ... ]"), so we look for the array delimiter tokens and any (spacing) numbers inside an array. Experimentation has shown that numbers greater than 100 can be treated as whitespace:

      if (!strcmp(buffer, "["))
      {
        // Start of an array for justified text...
        in_array = true;
      }
      else if (!strcmp(buffer, "]"))
      {
        // End of an array for justified text...
        in_array = false;
      }
      else if (!first && in_array && (isdigit(buffer[0]) || buffer[0] == '-') && fabs(atof(buffer)) > 100)
      {
        // Whitespace in a justified text block...
        putchar(' ');
      }

Tokens starting with ´(' or ´<' are text fragments. 8-bit text starting with ´(' needs to be mapped to Unicode using the current font encoding while hex strings starting with ´<' are UTF-16 (Unicode) that need to be converted to UTF-8:

      else if (buffer[0] == '(')
      {
        // Text string using an 8-bit encoding
        first = false;
    
        for (bufptr = buffer + 1; *bufptr; bufptr ++)
          put_utf8(encoding[*bufptr & 255]);
      }
      else if (buffer[0] == '<')
      {
        // Unicode text string
        first = false;
    
        puts_utf16(buffer + 1);
      }

Simple (8-bit) fonts include an encoding table that maps the 8-bit characters to one of 1051 Unicode glyph names. Since each font can use a different encoding, we look for font names starting with ´/' and the "Tf" (set text font) operator token and load that font's encoding using the load_encoding function:

      else if (buffer[0] == '/')
      {
        // Save name...
        strncpy(name, buffer + 1, sizeof(name) - 1);
        name[sizeof(name) - 1] = '\0';
      }
      else if (!strcmp(buffer, "Tf") && name[0])
      {
        // Set font...
        load_encoding(obj, name, encoding);
      }

Finally, some text operators start a new line in a text block, so when we see their tokens we output a newline:

      else if (!strcmp(buffer, "Td") || !strcmp(buffer, "TD") || !strcmp(buffer, "T*") ||
               !strcmp(buffer, "\'") || !strcmp(buffer, "\""))
      {
        // Text operators that advance to the next line in the block
        putchar('\n');
        first = true;
      }
    }

The load_encoding Function

The load_encoding function looks up the named font in the page's "Resources" dictionary. Every PDF simple font contains an "Encoding" dictionary with a base encoding ("WinANSI", "MacRoman", or "MacExpert") and a differences array that lists character indexes and glyph names for an 8-bit font.

We start by initializing the encoding array to the default WinANSI encoding and looking up the font object for the named font:

    static void
    load_encoding(
        pdfio_obj_t   *page_obj,            // I - Page object
        const char    *name,                // I - Font name
        int           encoding[256])        // O - Encoding table
    {
      size_t        i, j;                   // Looping vars
      pdfio_dict_t  *page_dict,             // Page dictionary
                    *resources_dict,        // Resources dictionary
                    *font_dict;             // Font dictionary
      pdfio_obj_t   *font_obj,              // Font object
                    *encoding_obj;          // Encoding object
      static int    win_ansi[32] =          // WinANSI characters from 128 to 159
      {
        ...
      };
      static int    mac_roman[128] =        // MacRoman characters from 128 to 255
      {
        ...
      };
    
    
      // Initialize the encoding to be the "standard" WinAnsi...
      for (i = 0; i < 128; i ++)
        encoding[i] = i;
      for (i = 160; i < 256; i ++)
        encoding[i] = i;
      memcpy(encoding + 128, win_ansi, sizeof(win_ansi));
    
      // Find the named font...
      if ((page_dict = pdfioObjGetDict(page_obj)) == NULL)
        return;
    
      if ((resources_dict = pdfioDictGetDict(page_dict, "Resources")) == NULL)
        return;
    
      if ((font_dict = pdfioDictGetDict(resources_dict, "Font")) == NULL)
      {
        // Font resources not a dictionary, see if it is an object...
        if ((font_obj = pdfioDictGetObj(resources_dict, "Font")) != NULL)
          font_dict = pdfioObjGetDict(font_obj);
    
        if (!font_dict)
          return;
      }
    
      if ((font_obj = pdfioDictGetObj(font_dict, name)) == NULL)
        return;

Once we have found the font we see if it has an "Encoding" dictionary:

      pdfio_dict_t  *encoding_dict;         // Encoding dictionary
    
      if ((encoding_obj = pdfioDictGetObj(pdfioObjGetDict(font_obj), "Encoding")) == NULL)
        return;
    
      if ((encoding_dict = pdfioObjGetDict(encoding_obj)) == NULL)
        return;

Once we have the encoding dictionary we can get the "BaseEncoding" and "Differences" values:

      const char    *base_encoding;         // BaseEncoding name
      pdfio_array_t *differences;           // Differences array
    
      // OK, have the encoding object, build the encoding using it...
      base_encoding = pdfioDictGetName(encoding_dict, "BaseEncoding");
      differences   = pdfioDictGetArray(encoding_dict, "Differences");

If the base encoding is "MacRomainEncoding", we need to reset the upper 128 characters in the encoding array match it:

      if (base_encoding && !strcmp(base_encoding, "MacRomanEncoding"))
      {
        // Map upper 128
        memcpy(encoding + 128, mac_roman, sizeof(mac_roman));
      }

Then we loop through the differences array, keeping track of the current index within the encoding array. A number indicates a new index while a name is the Unicode glyph for the current index:

      typedef struct name_map_s
      {
        const char    *name;                // Character name
        int           unicode;              // Unicode value
      } name_map_t;
    
      static name_map_t unicode_map[1051];  // List of glyph names
    
      if (differences)
      {
        // Apply differences
        size_t      count = pdfioArrayGetSize(differences);
                                            // Number of differences
        const char  *name;                  // Character name
        size_t      idx = 0;                // Index in encoding array
    
        for (i = 0; i < count; i ++)
        {
          switch (pdfioArrayGetType(differences, i))
          {
            case PDFIO_VALTYPE_NUMBER :
                // Get the index of the next character...
                idx = (size_t)pdfioArrayGetNumber(differences, i);
                break;
    
            case PDFIO_VALTYPE_NAME :
                // Lookup name and apply to encoding...
                if (idx < 0 || idx > 255)
                  break;
    
                name = pdfioArrayGetName(differences, i);
                for (j = 0; j < (sizeof(unicode_map) / sizeof(unicode_map[0])); j ++)
                {
                  if (!strcmp(name, unicode_map[j].name))
                  {
                    encoding[idx] = unicode_map[j].unicode;
                    break;
                  }
                }
                idx ++;
                break;
    
            default :
                // Do nothing for other values
                break;
          }
        }
      }
    }

Create a PDF File With Text and an Image

The image2pdf.c example code creates a PDF file containing a JPEG or PNG image file and optional caption on a single page. The create_pdf_image_file function creates the PDF file, embeds a base font and the named JPEG or PNG image file, and then creates a page with the image centered on the page with any text centered below:

    #include <pdfio.h>
    #include <pdfio-content.h>
    #include <string.h>
    
    
    bool                                    // O - True on success, false on failure
    create_pdf_image_file(
        const char *pdfname,                // I - PDF filename
        const char *imagename,              // I - Image filename
        const char *caption)                // I - Caption filename
    {
      pdfio_file_t   *pdf;                  // PDF file
      pdfio_obj_t    *font;                 // Caption font
      pdfio_obj_t    *image;                // Image
      pdfio_dict_t   *dict;                 // Page dictionary
      pdfio_stream_t *page;                 // Page stream
      double         width, height;         // Width and height of image
      double         swidth, sheight;       // Scaled width and height on page
      double         tx, ty;                // Position on page
    
    
      // Create the PDF file...
      pdf = pdfioFileCreate(pdfname, /*version*/NULL, /*media_box*/NULL, /*crop_box*/NULL,
                            /*error_cb*/NULL, /*error_cbdata*/NULL);
      if (!pdf)
        return (false);
    
      // Create a Courier base font for the caption
      font = pdfioFileCreateFontObjFromBase(pdf, "Courier");
    
      if (!font)
      {
        pdfioFileClose(pdf);
        return (false);
      }
    
      // Create an image object from the JPEG/PNG image file...
      image = pdfioFileCreateImageObjFromFile(pdf, imagename, true);
    
      if (!image)
      {
        pdfioFileClose(pdf);
        return (false);
      }
    
      // Create a page dictionary with the font and image...
      dict = pdfioDictCreate(pdf);
      pdfioPageDictAddFont(dict, "F1", font);
      pdfioPageDictAddImage(dict, "IM1", image);
    
      // Create the page and its content stream...
      page = pdfioFileCreatePage(pdf, dict);
    
      // Position and scale the image on the page...
      width  = pdfioImageGetWidth(image);
      height = pdfioImageGetHeight(image);
    
      // Default media_box is "universal" 595.28x792 points (8.27x11in or 210x279mm).
      // Use margins of 36 points (0.5in or 12.7mm) with another 36 points for the
      // caption underneath...
      swidth  = 595.28 - 72.0;
      sheight = swidth * height / width;
      if (sheight > (792.0 - 36.0 - 72.0))
      {
        sheight = 792.0 - 36.0 - 72.0;
        swidth  = sheight * width / height;
      }
    
      tx = 0.5 * (595.28 - swidth);
      ty = 0.5 * (792 - 36 - sheight);
    
      pdfioContentDrawImage(page, "IM1", tx, ty + 36.0, swidth, sheight);
    
      // Draw the caption in black...
      pdfioContentSetFillColorDeviceGray(page, 0.0);
    
      // Compute the starting point for the text - Courier is monospaced with a
      // nominal width of 0.6 times the text height...
      tx = 0.5 * (595.28 - 18.0 * 0.6 * strlen(caption));
    
      // Position and draw the caption underneath...
      pdfioContentTextBegin(page);
      pdfioContentSetTextFont(page, "F1", 18.0);
      pdfioContentTextMoveTo(page, tx, ty);
      pdfioContentTextShow(page, /*unicode*/false, caption);
      pdfioContentTextEnd(page);
    
      // Close the page stream and the PDF file...
      pdfioStreamClose(page);
      pdfioFileClose(pdf);
    
      return (true);
    }

Generate a Code 128 Barcode

One-dimensional barcodes are often rendered using special fonts that map ASCII characters to sequences of bars that can be read. The examples directory contains such a font (code128.ttf) to create "Code 128" barcodes, with an accompanying bit of example code in code128.c.

The first thing you need to do is prepare the barcode string to use with the font. Each barcode begins with a start pattern followed by the characters or digits you want to encode, a weighted sum digit, and a stop pattern. The make_code128 function creates this string:

    static char *                           // O - Output string
    make_code128(char       *dst,           // I - Destination buffer
                 const char *src,           // I - Source string
                 size_t     dstsize)        // I - Size of destination buffer
    {
      char          *dstptr,                // Pointer into destination buffer
                    *dstend;                // End of destination buffer
      int           sum;                    // Weighted sum
      static const char *code128_chars =    // Code 128 characters
                    " !\"#$%&'()*+,-./0123456789:;<=>?"
                    "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_"
                    "`abcdefghijklmnopqrstuvwxyz{|}~\303"
                    "\304\305\306\307\310\311\312";
      static const char code128_start_code_b = '\314';
                                            // Start code B
      static const char code128_stop = '\316';
                                            // Stop pattern
    
    
      // Start a Code B barcode...
      dstptr = dst;
      dstend = dst + dstsize - 3;
    
      *dstptr++ = code128_start_code_b;
      sum       = code128_start_code_b - 100;
    
      while (*src && dstptr < dstend)
      {
        if (*src >= ' ' && *src < 0x7f)
        {
          sum       += (dstptr - dst) * (*src - ' ');
          *dstptr++ = *src;
        }
    
        src ++;
      }
    
      // Add the weighted sum modulo 103
      *dstptr++ = code128_chars[sum % 103];
    
      // Add the stop pattern and return...
      *dstptr++ = code128_stop;
      *dstptr   = '\0';
    
      return (dst);
    }

The main function does the rest of the work. The barcode font is imported using the pdfioFileCreateFontObjFromFile function. We pass false for the "unicode" argument since we just want the (default) ASCII encoding:

    barcode_font = pdfioFileCreateFontObjFromFile(pdf, "code128.ttf", /*unicode*/false);

Since barcodes usually have the number or text represented by the barcode printed underneath it, we also need a regular text font, for which we can choose one of the standard 14 PostScript base fonts using the pdfioFIleCreateFontObjFromBase function:

    text_font = pdfioFileCreateFontObjFromBase(pdf, "Helvetica");

Once we have these fonts we can measure the barcode and regular text labels using the pdfioContentTextMeasure function to determine how large the PDF page needs to be to hold the barcode and text:

    // Compute sizes of the text...
    const char *barcode = argv[1];
    char barcode_temp[256];
    
    if (!(barcode[0] & 0x80))
      barcode = make_code128(barcode_temp, barcode, sizeof(barcode_temp));
    
    double barcode_height = 36.0;
    double barcode_width =
        pdfioContentTextMeasure(barcode_font, barcode, barcode_height);
    
    const char *text = argv[2];
    double text_height = 0.0;
    double text_width = 0.0;
    
    if (text && text_font)
    {
      text_height = 9.0;
      text_width  = pdfioContentTextMeasure(text_font, text, text_height);
    }
    
    // Compute the size of the PDF page...
    pdfio_rect_t media_box;
    
    media_box.x1 = 0.0;
    media_box.y1 = 0.0;
    media_box.x2 = (barcode_width > text_width ? barcode_width : text_width) + 18.0;
    media_box.y2 = barcode_height + text_height + 18.0;

Finally, we just need to create a page of the specified size that references the two fonts:

    // Start a page for the barcode...
    page_dict = pdfioDictCreate(pdf);
    
    pdfioDictSetRect(page_dict, "MediaBox", &media_box);
    pdfioDictSetRect(page_dict, "CropBox", &media_box);
    
    pdfioPageDictAddFont(page_dict, "B128", barcode_font);
    if (text_font)
      pdfioPageDictAddFont(page_dict, "TEXT", text_font);
    
    page_st = pdfioFileCreatePage(pdf, page_dict);

With the barcode font called "B128" and the text font called "TEXT", we can use them to draw two strings:

    // Draw the page...
    pdfioContentSetFillColorGray(page_st, 0.0);
    
    pdfioContentSetTextFont(page_st, "B128", barcode_height);
    pdfioContentTextBegin(page_st);
    pdfioContentTextMoveTo(page_st, 0.5 * (media_box.x2 - barcode_width),
                           9.0 + text_height);
    pdfioContentTextShow(page_st, /*unicode*/false, barcode);
    pdfioContentTextEnd(page_st);
    
    if (text && text_font)
    {
      pdfioContentSetTextFont(page_st, "TEXT", text_height);
      pdfioContentTextBegin(page_st);
      pdfioContentTextMoveTo(page_st, 0.5 * (media_box.x2 - text_width), 9.0);
      pdfioContentTextShow(page_st, /*unicode*/false, text);
      pdfioContentTextEnd(page_st);
    }
    
    pdfioStreamClose(page_st);

Convert Markdown to PDF

Markdown is a simple plain text format that supports things like headings, links, character styles, tables, and embedded images. The md2pdf.c example code uses the mmd library to convert markdown to a PDF file that can be distributed.

Note: The md2pdf example is by far the most complex example code included with PDFio and shows how to layout text, add headers and footers, add links, embed images, format tables, and add an outline (table of contents) for navigation.

Managing Document State

The md2pdf program needs to maintain three sets of state - one for the markdown document which is represented by nodes of type mmd_t and the others for the PDF document and current PDF page which are contained in the docdata_t structure:

    typedef struct docdata_s                // Document formatting data
    {
      // State for the whole document
      pdfio_file_t  *pdf;                   // PDF file
      pdfio_rect_t  media_box;              // Media (page) box
      pdfio_rect_t  crop_box;               // Crop box (for margins)
      pdfio_rect_t  art_box;                // Art box (for markdown content)
      pdfio_obj_t   *fonts[DOCFONT_MAX];    // Embedded fonts
      double        font_space;             // Unit width of a space
      size_t        num_images;             // Number of embedded images
      docimage_t    images[DOCIMAGE_MAX];   // Embedded images
      const char    *title;                 // Document title
      char          *heading;               // Current document heading
      size_t        num_actions;            // Number of actions for this document
      docaction_t   actions[DOCACTION_MAX]; // Actions for this document
      size_t        num_targets;            // Number of targets for this document
      doctarget_t   targets[DOCTARGET_MAX]; // Targets for this document
      size_t        num_toc;                // Number of table-of-contents entries
      doctoc_t      toc[DOCTOC_MAX];        // Table-of-contents entries
    
      // State for the current page
      pdfio_stream_t *st;                   // Current page stream
      double        y;                      // Current position on page
      const char    *tag;                   // Current block tag
      bool          in_table,               // Are we in a table?
                    in_row;                 // Are we in a table row?
      docfont_t     font;                   // Current font
      double        fsize;                  // Current font size
      doccolor_t    color;                  // Current color
      pdfio_array_t *annots_array;          // Annotations array (for links)
      pdfio_obj_t   *annots_obj;            // Annotations object (for links)
      size_t        num_links;              // Number of links for this page
      doclink_t     links[DOCLINK_MAX];     // Links for this page
    } docdata_t;

Document State

The output is fixed to the "universal" media size (the intersection of US Letter and ISO A4) with 1/2 inch margins - the PAGE_ constants can be changed to select a different size or margins. The media_box member contains the "MediaBox" rectangle for the PDF pages, while the crop_box and art_box members contain the "CropBox" and "ArtBox" values, respectively.

Four embedded fonts are used:

·

DOCFONT_REGULAR: the default font used for text,

·

DOCFONT_BOLD: a boldface font used for heading and strong text,

·

DOCFONT_ITALIC: an italic/oblique font used for emphasized text, and

·

DOCFONT_MONOSPACE: a fixed-width font used for code.

By default the code uses the base PostScript fonts Helvetica, Helvetica-Bold, Helvetica-Oblique, and Courier. The USE_TRUETYPE define can be used to replace these with the Roboto TrueType fonts.

Embedded JPEG and PNG images are copied into the PDF document, with the images array containing the list of the images and their objects.

The title member contains the document title, while the heading member contains the current heading text.

The actions array contains a list of action dictionaries for interior document links that need to be resolved, while the targets array keeps track of the location of the headings in the PDF document.

The toc array contains a list of headings and is used to construct the PDF outlines dictionaries/objects, which provides a table of contents for navigation in most PDF readers.

Page State

The st member provides the stream for the current page content. The color, font, fsize, and y members provide the current graphics state on the page.

The annots_array, annots_obj, num_links, and links members contain a list of hyperlinks on the current page.

Creating Pages

The new_page function is used to start a new page. Aside from creating the new page object and stream, it adds a standard header and footer to the page. It starts by closing the current page if it is open:

    // Close the current page...
    if (dd->st)
    {
      pdfioStreamClose(dd->st);
      add_links(dd);
    }

The new page needs a dictionary containing any link annotations, the media and art boxes, the four fonts, and any images:

    // Prep the new page...
    page_dict = pdfioDictCreate(dd->pdf);
    
    dd->annots_array = pdfioArrayCreate(dd->pdf);
    dd->annots_obj   = pdfioFileCreateArrayObj(dd->pdf, dd->annots_array);
    pdfioDictSetObj(page_dict, "Annots", dd->annots_obj);
    
    pdfioDictSetRect(page_dict, "MediaBox", &dd->media_box);
    pdfioDictSetRect(page_dict, "ArtBox", &dd->art_box);
    
    for (fontface = DOCFONT_REGULAR; fontface < DOCFONT_MAX; fontface ++)
      pdfioPageDictAddFont(page_dict, docfont_names[fontface], dd->fonts[fontface]);
    
    for (i = 0; i < dd->num_images; i ++)
      pdfioPageDictAddImage(page_dict, pdfioStringCreatef(dd->pdf, "I%u", (unsigned)i),
                            dd->images[i].obj);

Once the page dictionary is initialized, we create a new page and initialize the current graphics state:

    dd->st    = pdfioFileCreatePage(dd->pdf, page_dict);
    dd->color = DOCCOLOR_BLACK;
    dd->font  = DOCFONT_MAX;
    dd->fsize = 0.0;
    dd->y     = dd->art_box.y2;

The header consists of a dark gray separating line and the document title. We don't show the header on the first page:

    // Add header/footer text
    set_color(dd, DOCCOLOR_GRAY);
    set_font(dd, DOCFONT_REGULAR, SIZE_HEADFOOT);
    
    if (pdfioFileGetNumPages(dd->pdf) > 1 && dd->title)
    {
      // Show title in header...
      width = pdfioContentTextMeasure(dd->fonts[DOCFONT_REGULAR], dd->title,
                                      SIZE_HEADFOOT);
    
      pdfioContentTextBegin(dd->st);
      pdfioContentTextMoveTo(dd->st,
                             dd->crop_box.x1 + 0.5 * (dd->crop_box.x2 -
                                 dd->crop_box.x1 - width),
                             dd->crop_box.y2 - SIZE_HEADFOOT);
      pdfioContentTextShow(dd->st, UNICODE_VALUE, dd->title);
      pdfioContentTextEnd(dd->st);
    
      pdfioContentPathMoveTo(dd->st, dd->crop_box.x1,
                             dd->crop_box.y2 - 2 * SIZE_HEADFOOT * LINE_HEIGHT +
                                 SIZE_HEADFOOT);
      pdfioContentPathLineTo(dd->st, dd->crop_box.x2,
                             dd->crop_box.y2 - 2 * SIZE_HEADFOOT * LINE_HEIGHT +
                                 SIZE_HEADFOOT);
      pdfioContentStroke(dd->st);
    }

The footer contains the same dark gray separating line with the current heading and page number on opposite sides. The page number is always positioned on the outer edge for a two-sided print - right justified on odd numbered pages and left justified on even numbered pages:

    // Show page number and current heading...
    pdfioContentPathMoveTo(dd->st, dd->crop_box.x1,
                           dd->crop_box.y1 + SIZE_HEADFOOT * LINE_HEIGHT);
    pdfioContentPathLineTo(dd->st, dd->crop_box.x2,
                           dd->crop_box.y1 + SIZE_HEADFOOT * LINE_HEIGHT);
    pdfioContentStroke(dd->st);
    
    pdfioContentTextBegin(dd->st);
    snprintf(temp, sizeof(temp), "%u", (unsigned)pdfioFileGetNumPages(dd->pdf));
    if (pdfioFileGetNumPages(dd->pdf) & 1)
    {
      // Page number on right...
      width = pdfioContentTextMeasure(dd->fonts[DOCFONT_REGULAR], temp, SIZE_HEADFOOT);
      pdfioContentTextMoveTo(dd->st, dd->crop_box.x2 - width, dd->crop_box.y1);
    }
    else
    {
      // Page number on left...
      pdfioContentTextMoveTo(dd->st, dd->crop_box.x1, dd->crop_box.y1);
    }
    
    pdfioContentTextShow(dd->st, UNICODE_VALUE, temp);
    pdfioContentTextEnd(dd->st);
    
    if (dd->heading)
    {
      pdfioContentTextBegin(dd->st);
    
      if (pdfioFileGetNumPages(dd->pdf) & 1)
      {
        // Current heading on left...
        pdfioContentTextMoveTo(dd->st, dd->crop_box.x1, dd->crop_box.y1);
      }
      else
      {
        width = pdfioContentTextMeasure(dd->fonts[DOCFONT_REGULAR], dd->heading,
                                        SIZE_HEADFOOT);
        pdfioContentTextMoveTo(dd->st, dd->crop_box.x2 - width, dd->crop_box.y1);
      }
    
      pdfioContentTextShow(dd->st, UNICODE_VALUE, dd->heading);
      pdfioContentTextEnd(dd->st);
    }

Formatting the Markdown Document

Four functions handle the formatting of the markdown document:

·

format_block formats a single paragraph, heading, or table cell,

·

format_code: formats a block of code,

·

format_doc: formats the document as a whole, and

·

format_table: formats a table.

Formatted content is organized into arrays of linefrag_t and tablerow_t structures for a line of content or row of table cells, respectively.

High-Level Formatting

The format_doc function iterates over the block nodes in the markdown document. We map a "thematic break" (horizontal rule) to a page break, which is implemented by moving the current vertical position to the bottom of the page:

    case MMD_TYPE_THEMATIC_BREAK :
        // Force a page break
        dd->y = dd->art_box.y1;
        break;

A block quote is indented and uses the italic font by default:

    case MMD_TYPE_BLOCK_QUOTE :
        format_doc(dd, current, DOCFONT_ITALIC, left + BQ_PADDING, right - BQ_PADDING);
        break;

Lists have a leading blank line and are indented:

    case MMD_TYPE_ORDERED_LIST :
    case MMD_TYPE_UNORDERED_LIST :
        if (dd->st)
          dd->y -= SIZE_BODY * LINE_HEIGHT;
    
        format_doc(dd, current, deffont, left + LIST_PADDING, right);
        break;

List items do not have a leading blank line and make use of leader text that is shown in front of the list text. The leader text is either the current item number or a bullet, which then is directly formatted using the format_block function:

    case MMD_TYPE_LIST_ITEM :
        if (doctype == MMD_TYPE_ORDERED_LIST)
        {
          snprintf(leader, sizeof(leader), "%d. ", i);
          format_block(dd, current, deffont, SIZE_BODY, left, right, leader);
        }
        else
        {
          format_block(dd, current, deffont, SIZE_BODY, left, right, /*leader*/"• ");
        }
        break;

Paragraphs have a leading blank line and are likewise directly formatted:

    case MMD_TYPE_PARAGRAPH :
        // Add a blank line before the paragraph...
        dd->y -= SIZE_BODY * LINE_HEIGHT;
    
        // Format the paragraph...
        format_block(dd, current, deffont, SIZE_BODY, left, right, /*leader*/NULL);
        break;

Tables have a leading blank line and are formatted using the format_table function:

    case MMD_TYPE_TABLE :
        // Add a blank line before the paragraph...
        dd->y -= SIZE_BODY * LINE_HEIGHT;
    
        // Format the table...
        format_table(dd, current, left, right);
        break;

Code blocks have a leading blank line, are indented slightly (to account for the padded background), and are formatted using the format_code function:

    case MMD_TYPE_CODE_BLOCK :
        // Add a blank line before the code block...
        dd->y -= SIZE_BODY * LINE_HEIGHT;
    
        // Format the code block...
        format_code(dd, current, left + CODE_PADDING, right - CODE_PADDING);
        break;

Headings get some extra processing. First, the current heading is remembered in the docdata_t structure so it can be used in the page footer:

    case MMD_TYPE_HEADING_1 :
    case MMD_TYPE_HEADING_2 :
    case MMD_TYPE_HEADING_3 :
    case MMD_TYPE_HEADING_4 :
    case MMD_TYPE_HEADING_5 :
    case MMD_TYPE_HEADING_6 :
        // Update the current heading
        free(dd->heading);
        dd->heading = mmdCopyAllText(current);

Then we add a blank line and format the heading with the boldface font at a larger size using the format_block function:

        // Add a blank line before the heading...
        dd->y -= heading_sizes[curtype - MMD_TYPE_HEADING_1] * LINE_HEIGHT;
    
        // Format the heading...
        format_block(dd, current, DOCFONT_BOLD,
                     heading_sizes[curtype - MMD_TYPE_HEADING_1], left, right,
                     /*leader*/NULL);

Once the heading is formatted, we record it in the toc array as a PDF outline item object/dictionary:

        // Add the heading to the table-of-contents...
        if (dd->num_toc < DOCTOC_MAX)
        {
          doctoc_t *t = dd->toc + dd->num_toc;
                                      // New TOC
          pdfio_array_t *dest;  // Destination array
    
          t->level = curtype - MMD_TYPE_HEADING_1;
          t->dict  = pdfioDictCreate(dd->pdf);
          t->obj   = pdfioFileCreateObj(dd->pdf, t->dict);
          dest     = pdfioArrayCreate(dd->pdf);
    
          pdfioArrayAppendObj(dest,
              pdfioFileGetPage(dd->pdf, pdfioFileGetNumPages(dd->pdf) - 1));
          pdfioArrayAppendName(dest, "XYZ");
          pdfioArrayAppendNumber(dest, PAGE_LEFT);
          pdfioArrayAppendNumber(dest,
              dd->y + heading_sizes[curtype - MMD_TYPE_HEADING_1] * LINE_HEIGHT);
          pdfioArrayAppendNumber(dest, 0.0);
    
          pdfioDictSetArray(t->dict, "Dest", dest);
          pdfioDictSetString(t->dict, "Title", pdfioStringCreate(dd->pdf, dd->heading));
    
          dd->num_toc ++;
        }

Finally, we also save the heading's target name and its location in the targets array to allow interior links to work:

        // Add the heading to the list of link targets...
        if (dd->num_targets < DOCTARGET_MAX)
        {
          doctarget_t *t = dd->targets + dd->num_targets;
                                      // New target
    
          make_target_name(t->name, dd->heading, sizeof(t->name));
          t->page = pdfioFileGetNumPages(dd->pdf) - 1;
          t->y    = dd->y + heading_sizes[curtype - MMD_TYPE_HEADING_1] * LINE_HEIGHT;
    
          dd->num_targets ++;
        }
        break;

Formatting Paragraphs, Headings, List Items, and Table Cells

Paragraphs, headings, list items, and table cells all use the same basic formatting algorithm. Text, checkboxes, and images are collected until the nodes in the current block are used up or the content reaches the right margin.

In order to keep adjacent blocks of text together, the formatting algorithm makes sure that at least 3 lines of text can fit before the bottom edge of the page:

    if (mmdGetNextSibling(block))
      need_bottom = 3.0 * SIZE_BODY * LINE_HEIGHT;
    else
      need_bottom = 0.0;

Leader text (used for list items) is right justified to the left margin and becomes the first fragment on the line when present.

    if (leader)
    {
      // Add leader text on first line...
      frags[0].type     = MMD_TYPE_NORMAL_TEXT;
      frags[0].width    = pdfioContentTextMeasure(dd->fonts[deffont], leader, fsize);
      frags[0].height   = fsize;
      frags[0].x        = left - frags[0].width;
      frags[0].imagenum = 0;
      frags[0].text     = leader;
      frags[0].url      = NULL;
      frags[0].ws       = false;
      frags[0].font     = deffont;
      frags[0].color    = DOCCOLOR_BLACK;
    
      num_frags  = 1;
      lineheight = fsize * LINE_HEIGHT;
    }
    else
    {
      // No leader text...
      num_frags  = 0;
      lineheight = 0.0;
    }
    
    frag = frags + num_frags;

If the current content fragment won't fit, we call render_line to draw what we have, adjusting the left margin as needed for table cells:

      // See if this node will fit on the current line...
      if ((num_frags > 0 && (x + width + wswidth) >= right) || num_frags == LINEFRAG_MAX)
      {
        // No, render this line and start over...
        if (blocktype == MMD_TYPE_TABLE_HEADER_CELL ||
            blocktype == MMD_TYPE_TABLE_BODY_CELL_CENTER)
          margin_left = 0.5 * (right - x);
        else if (blocktype == MMD_TYPE_TABLE_BODY_CELL_RIGHT)
          margin_left = right - x;
        else
          margin_left = 0.0;
    
        render_line(dd, margin_left, need_bottom, lineheight, num_frags, frags);
    
        num_frags   = 0;
        frag        = frags;
        x           = left;
        lineheight  = 0.0;
        need_bottom = 0.0;

Block quotes (blocks use a default font of italic) have an orange bar to the left of the block:

        if (deffont == DOCFONT_ITALIC)
        {
          // Add an orange bar to the left of block quotes...
          set_color(dd, DOCCOLOR_ORANGE);
          pdfioContentSave(dd->st);
          pdfioContentSetLineWidth(dd->st, 3.0);
          pdfioContentPathMoveTo(dd->st, left - 6.0, dd->y - (LINE_HEIGHT - 1.0) * fsize);
          pdfioContentPathLineTo(dd->st, left - 6.0, dd->y + fsize);
          pdfioContentStroke(dd->st);
          pdfioContentRestore(dd->st);
        }

Finally, we add the current content fragment to the array:

      // Add the current node to the fragment list
      if (num_frags == 0)
      {
        // No leading whitespace at the start of the line
        ws      = false;
        wswidth = 0.0;
      }
    
      frag->type       = type;
      frag->x          = x;
      frag->width      = width + wswidth;
      frag->height     = text ? fsize : height;
      frag->imagenum   = imagenum;
      frag->text       = text;
      frag->url        = url;
      frag->ws         = ws;
      frag->font       = font;
      frag->color      = color;
    
      num_frags ++;
      frag ++;
      x += width + wswidth;
      if (height > lineheight)
        lineheight = height;

Formatting Code Blocks

Code blocks consist of one or more lines of plain monospaced text. We draw a light gray background behind each line with a small bit of padding at the top and bottom:

    // Draw the top padding...
    set_color(dd, DOCCOLOR_LTGRAY);
    pdfioContentPathRect(dd->st, left - CODE_PADDING, dd->y + SIZE_CODEBLOCK,
                         right - left + 2.0 * CODE_PADDING, CODE_PADDING);
    pdfioContentFillAndStroke(dd->st, false);
    
    // Start a code text block...
    dd->tag = "P";
    pdfioContentBeginMarked(dd->st, dd->tag, /*dict*/NULL);
    
    set_font(dd, DOCFONT_MONOSPACE, SIZE_CODEBLOCK);
    pdfioContentTextBegin(dd->st);
    pdfioContentTextMoveTo(dd->st, left, dd->y);
    
    for (code = mmdGetFirstChild(block); code; code = mmdGetNextSibling(code))
    {
      set_color(dd, DOCCOLOR_LTGRAY);
      pdfioContentPathRect(dd->st, left - CODE_PADDING,
                           dd->y - (LINE_HEIGHT - 1.0) * SIZE_CODEBLOCK,
                           right - left + 2.0 * CODE_PADDING, lineheight);
      pdfioContentFillAndStroke(dd->st, false);
    
      set_color(dd, DOCCOLOR_RED);
      pdfioContentTextShow(dd->st, UNICODE_VALUE, mmdGetText(code));
      dd->y -= lineheight;
    
      if (dd->y < dd->art_box.y1)
      {
        // End the current text block...
        pdfioContentTextEnd(dd->st);
    
        // Start a new page...
        new_page(dd);
        set_font(dd, DOCFONT_MONOSPACE, SIZE_CODEBLOCK);
    
        dd->y -= lineheight;
    
        pdfioContentTextBegin(dd->st);
        pdfioContentTextMoveTo(dd->st, left, dd->y);
      }
    }
    
    // End the current text block...
    pdfioContentTextEnd(dd->st);
    dd->y += lineheight;
    
    pdfioContentEndMarked(dd->st);
    dd->tag = NULL;
    
    // Draw the bottom padding...
    set_color(dd, DOCCOLOR_LTGRAY);
    pdfioContentPathRect(dd->st, left - CODE_PADDING,
                         dd->y - CODE_PADDING - (LINE_HEIGHT - 1.0) * SIZE_CODEBLOCK,
                         right - left + 2.0 * CODE_PADDING, CODE_PADDING);
    pdfioContentFillAndStroke(dd->st, false);

Formatting Tables

Tables are the most difficult to format. We start by scanning the entire table and measuring every cell with the measure_cell function:

    for (num_cols = 0, num_rows = 0, rowptr = rows, current = mmdGetFirstChild(table);
         current && num_rows < TABLEROW_MAX;
         current = next)
    {
      next = mmd_walk_next(table, current);
      type = mmdGetType(current);
    
      if (type == MMD_TYPE_TABLE_ROW)
      {
        // Parse row...
        for (col = 0, current = mmdGetFirstChild(current);
             current && num_cols < TABLECOL_MAX;
             current = mmdGetNextSibling(current), col ++)
        {
          rowptr->cells[col] = current;
    
          measure_cell(dd, current, cols + col);
    
          if (col >= num_cols)
            num_cols = col + 1;
        }
    
        rowptr ++;
        num_rows ++;
      }
    }

The measure_cell function also updates the minimum and maximum width needed for each column. To this we add the cell padding to compute the total table width:

    // Figure out the width of each column...
    for (col = 0, table_width = 0.0; col < num_cols; col ++)
    {
      cols[col].max_width += 2.0 * TABLE_PADDING;
    
      table_width += cols[col].max_width;
      cols[col].width = cols[col].max_width;
    }

If the calculated width is more than the available width, we need to adjust the width of the columns. The algorithm used here breaks the available width into N equal-width columns - any columns wider than this will be scaled proportionately. This works out as two steps - one to calculate the the base width of "narrow" columns and a second to distribute the remaining width amongst the wider columns:

    format_width = right - left - 2.0 * TABLE_PADDING * num_cols;
    
    if (table_width > format_width)
    {
      // Content too wide, try scaling the widths...
      double      avg_width,              // Average column width
                  base_width,             // Base width
                  remaining_width,        // Remaining width
                  scale_width;            // Width for scaling
      size_t      num_remaining_cols = 0; // Number of remaining columns
    
      // First mark any columns that are narrower than the average width...
      avg_width = format_width / num_cols;
    
      for (col = 0, base_width = 0.0, remaining_width = 0.0; col < num_cols; col ++)
      {
        if (cols[col].width > avg_width)
        {
          remaining_width += cols[col].width;
          num_remaining_cols ++;
        }
        else
        {
          base_width += cols[col].width;
        }
      }
    
      // Then proportionately distribute the remaining width to the other columns...
      format_width -= base_width;
    
      for (col = 0, table_width = 0.0; col < num_cols; col ++)
      {
        if (cols[col].width > avg_width)
          cols[col].width = cols[col].width * format_width / remaining_width;
    
        table_width += cols[col].width;
      }
    }

Now that we have the widths of the columns, we can calculate the left and right margins of each column for formatting the cell text:

    // Calculate the margins of each column in preparation for formatting
    for (col = 0, x = left + TABLE_PADDING; col < num_cols; col ++)
    {
      cols[col].left  = x;
      cols[col].right = x + cols[col].width;
    
      x += cols[col].width + 2.0 * TABLE_PADDING;
    }

Then we re-measure the cells using the final column widths to determine the height of each cell and row:

    // Calculate the height of each row and cell in preparation for formatting
    for (row = 0, rowptr = rows; row < num_rows; row ++, rowptr ++)
    {
      for (col = 0; col < num_cols; col ++)
      {
        height = measure_cell(dd, rowptr->cells[col], cols + col) + 2.0 * TABLE_PADDING;
        if (height > rowptr->height)
          rowptr->height = height;
      }
    }

Finally, we render each row in the table:

    // Render each table row...
    dd->in_table = true;
    
    if (dd->st)
      pdfioContentBeginMarked(dd->st, "Table", /*dict*/NULL);
    
    for (row = 0, rowptr = rows; row < num_rows; row ++, rowptr ++)
      render_row(dd, num_cols, cols, rowptr);
    
    pdfioContentEndMarked(dd->st);
    
    dd->in_table = false;

Rendering the Markdown Document

The formatted content in arrays of linefrag_t and tablerow_t structures are passed to the render_line and render_row functions respectively to produce content in the PDF document.

Rendering a Line in a Paragraph, Heading, or Table Cell

The render_line function adds content from the linefrag_t array to a PDF page. It starts by determining whether a new page is needed:

    if (!dd->st)
    {
      new_page(dd);
      margin_top = 0.0;
    }
    
    dd->y -= margin_top + lineheight;
    if ((dd->y - need_bottom) < dd->art_box.y1)
    {
      new_page(dd);
    
      dd->y -= lineheight;
    }

We then loop through the fragments for the current line, drawing checkboxes, images, and text as needed. When a hyperlink is present, we add the link to the links array in the docdata_t structure, mapping "@" and "@@" to an internal link corresponding to the linked text:

    if (frag->url && dd->num_links < DOCLINK_MAX)
    {
      doclink_t *l = dd->links + dd->num_links;
                                      // Pointer to this link record
    
      if (!strcmp(frag->url, "@"))
      {
        // Use mapped text as link target...
        char  targetlink[129];        // Targeted link
    
        targetlink[0] = '#';
        make_target_name(targetlink + 1, frag->text, sizeof(targetlink) - 1);
    
        l->url = pdfioStringCreate(dd->pdf, targetlink);
      }
      else if (!strcmp(frag->url, "@@"))
      {
        // Use literal text as anchor...
        l->url = pdfioStringCreatef(dd->pdf, "#%s", frag->text);
      }
      else
      {
        // Use URL as-is...
        l->url = frag->url;
      }
    
      l->box.x1 = frag->x;
      l->box.y1 = dd->y;
      l->box.x2 = frag->x + frag->width;
      l->box.y2 = dd->y + frag->height;
    
      dd->num_links ++;
    }

These are later written as annotations in the add_links function.

Rendering a Table Row

The render_row function takes a row of cells and the corresponding column definitions. It starts by drawing the border boxes around body cells:

    if (mmdGetType(row->cells[0]) == MMD_TYPE_TABLE_HEADER_CELL)
    {
      // Header row, no border...
      deffont = DOCFONT_BOLD;
    }
    else
    {
      // Regular body row, add borders...
      deffont = DOCFONT_REGULAR;
    
      set_color(dd, DOCCOLOR_GRAY);
      pdfioContentPathRect(dd->st, cols[0].left - TABLE_PADDING, dd->y - row->height,
                           cols[num_cols - 1].right - cols[0].left +
                               2.0 * TABLE_PADDING, row->height);
      for (col = 1; col < num_cols; col ++)
      {
        pdfioContentPathMoveTo(dd->st, cols[col].left - TABLE_PADDING, dd->y);
        pdfioContentPathLineTo(dd->st, cols[col].left - TABLE_PADDING, dd->y - row->height);
      }
      pdfioContentStroke(dd->st);
    }

Then it formats each cell using the format_block function described previously. The page y value is reset before formatting each cell:

    row_y = dd->y;
    
    for (col = 0; col < num_cols; col ++)
    {
      dd->y = row_y;
    
      format_block(dd, row->cells[col], deffont, SIZE_TABLE, cols[col].left,
                   cols[col].right, /*leader*/NULL);
    }
    
    dd->y = row_y - row->height;

Enumerations

pdfio_cs_e

Standard color spaces

PDFIO_CS_ADOBE

AdobeRGB 1998

PDFIO_CS_CGATS001

CGATS001 (CMYK)

PDFIO_CS_P3_D65

Display P3

PDFIO_CS_SRGB

sRGB

pdfio_encryption_e

PDF encryption modes

PDFIO_ENCRYPTION_AES_128

128-bit AES encryption (PDF 1.6)

PDFIO_ENCRYPTION_AES_256

256-bit AES encryption (PDF 2.0)

PDFIO_ENCRYPTION_NONE

No encryption

PDFIO_ENCRYPTION_RC4_128

128-bit RC4 encryption (PDF 1.4)

PDFIO_ENCRYPTION_RC4_40

40-bit RC4 encryption (PDF 1.3, reading only)

pdfio_filter_e

Compression/decompression filters for streams

PDFIO_FILTER_ASCII85

ASCII85Decode filter (reading only)

PDFIO_FILTER_ASCIIHEX

ASCIIHexDecode filter (reading only)

PDFIO_FILTER_CCITTFAX

CCITTFaxDecode filter

PDFIO_FILTER_CRYPT

Encryption filter

PDFIO_FILTER_DCT

DCTDecode (JPEG) filter

PDFIO_FILTER_FLATE

FlateDecode filter

PDFIO_FILTER_JBIG2

JBIG2Decode filter

PDFIO_FILTER_JPX

JPXDecode filter (reading only)

PDFIO_FILTER_LZW

LZWDecode filter (reading only)

PDFIO_FILTER_NONE

No filter

PDFIO_FILTER_RUNLENGTH

RunLengthDecode filter (reading only)

pdfio_linecap_e

Line capping modes

PDFIO_LINECAP_BUTT

Butt ends

PDFIO_LINECAP_ROUND

Round ends

PDFIO_LINECAP_SQUARE

Square ends

pdfio_linejoin_e

Line joining modes

PDFIO_LINEJOIN_BEVEL

Bevel joint

PDFIO_LINEJOIN_MITER

Miter joint

PDFIO_LINEJOIN_ROUND

Round joint

pdfio_permission_e

PDF permission bits

PDFIO_PERMISSION_ALL
PDFIO_PERMISSION_ANNOTATE

PDF allows annotation

PDFIO_PERMISSION_ASSEMBLE

PDF allows assembly (insert, delete, or rotate pages, add document outlines and thumbnails)

PDFIO_PERMISSION_COPY

PDF allows copying

PDFIO_PERMISSION_FORMS

PDF allows filling in forms

PDFIO_PERMISSION_MODIFY

PDF allows modification

PDFIO_PERMISSION_NONE

No permissions

PDFIO_PERMISSION_PRINT

PDF allows printing

PDFIO_PERMISSION_PRINT_HIGH

PDF allows high quality printing

PDFIO_PERMISSION_READING

PDF allows screen reading/accessibility (deprecated in PDF 2.0)

~0

All permissions

pdfio_textrendering_e

Text rendering modes

PDFIO_TEXTRENDERING_FILL

Fill text

PDFIO_TEXTRENDERING_FILL_AND_STROKE

Fill then stroke text

PDFIO_TEXTRENDERING_FILL_AND_STROKE_PATH

Fill then stroke text and add to path

PDFIO_TEXTRENDERING_FILL_PATH

Fill text and add to path

PDFIO_TEXTRENDERING_INVISIBLE

Don't fill or stroke (invisible)

PDFIO_TEXTRENDERING_STROKE

Stroke text

PDFIO_TEXTRENDERING_STROKE_PATH

Stroke text and add to path

PDFIO_TEXTRENDERING_TEXT_PATH

Add text to path (invisible)

pdfio_valtype_e

PDF value types

PDFIO_VALTYPE_ARRAY

Array

PDFIO_VALTYPE_BINARY

Binary data

PDFIO_VALTYPE_BOOLEAN

Boolean

PDFIO_VALTYPE_DATE

Date/time

PDFIO_VALTYPE_DICT

Dictionary

PDFIO_VALTYPE_INDIRECT

Indirect object (N G obj)

PDFIO_VALTYPE_NAME

Name

PDFIO_VALTYPE_NONE

No value, not set

PDFIO_VALTYPE_NULL

Null object

PDFIO_VALTYPE_NUMBER

Number (integer or real)

PDFIO_VALTYPE_STRING

String

Functions

pdfioArrayAppendArray

Add an array value to an array.

bool  pdfioArrayAppendArray (
    pdfio_array_t *a,
    pdfio_array_t *value
);

pdfioArrayAppendBinary

Add a binary string value to an array.

bool  pdfioArrayAppendBinary (
    pdfio_array_t *a,
    const unsigned char *value,
    size_t valuelen
);

pdfioArrayAppendBoolean

Add a boolean value to an array.

bool  pdfioArrayAppendBoolean (
    pdfio_array_t *a,
    bool value
);

pdfioArrayAppendDate

Add a date value to an array.

bool  pdfioArrayAppendDate (
    pdfio_array_t *a,
    time_t value
);

pdfioArrayAppendDict

Add a dictionary to an array.

bool  pdfioArrayAppendDict (
    pdfio_array_t *a,
    pdfio_dict_t *value
);

pdfioArrayAppendName

Add a name to an array.

bool  pdfioArrayAppendName (
    pdfio_array_t *a,
    const char *value
);

pdfioArrayAppendNumber

Add a number to an array.

bool  pdfioArrayAppendNumber (
    pdfio_array_t *a,
    double value
);

pdfioArrayAppendObj

Add an indirect object reference to an array.

bool  pdfioArrayAppendObj (
    pdfio_array_t *a,
    pdfio_obj_t *value
);

pdfioArrayAppendString

Add a string to an array.

bool  pdfioArrayAppendString (
    pdfio_array_t *a,
    const char *value
);

pdfioArrayCopy

Copy an array.

pdfio_array_t * pdfioArrayCopy (
    pdfio_file_t *pdf,
    pdfio_array_t *a
);

pdfioArrayCreate

Create an empty array.

pdfio_array_t * pdfioArrayCreate (
    pdfio_file_t *pdf
);

pdfioArrayCreateColorFromICCObj

Create an ICC-based color space array.

pdfio_array_t * pdfioArrayCreateColorFromICCObj (
    pdfio_file_t *pdf,
    pdfio_obj_t *icc_object
);

pdfioArrayCreateColorFromMatrix

Create a calibrated color space array using a CIE XYZ transform matrix.

pdfio_array_t * pdfioArrayCreateColorFromMatrix (
    pdfio_file_t *pdf,
    size_t num_colors,
    double gamma,
    const double matrix[3][3],
    const double white_point[3]
);

pdfioArrayCreateColorFromPalette

Create an indexed color space array.

pdfio_array_t * pdfioArrayCreateColorFromPalette (
    pdfio_file_t *pdf,
    size_t num_colors,
    const unsigned char *colors
);

pdfioArrayCreateColorFromPrimaries

Create a calibrated color sapce array using CIE xy primary chromacities.

pdfio_array_t * pdfioArrayCreateColorFromPrimaries (
    pdfio_file_t *pdf,
    size_t num_colors,
    double gamma,
    double wx,
    double wy,
    double rx,
    double ry,
    double gx,
    double gy,
    double bx,
    double by
);

pdfioArrayCreateColorFromStandard

Create a color array for a standard color space.

pdfio_array_t * pdfioArrayCreateColorFromStandard (
    pdfio_file_t *pdf,
    size_t num_colors,
    pdfio_cs_t cs
);

This function creates a color array for a standard PDFIO_CS_ enumerated color space. The "num_colors" argument must be 1 for grayscale, 3 for RGB color, and 4 for CMYK color.

pdfioArrayGetArray

Get an array value from an array.

pdfio_array_t * pdfioArrayGetArray (
    pdfio_array_t *a,
    size_t n
);

pdfioArrayGetBinary

Get a binary string value from an array.

unsigned char * pdfioArrayGetBinary (
    pdfio_array_t *a,
    size_t n,
    size_t *length
);

pdfioArrayGetBoolean

Get a boolean value from an array.

bool  pdfioArrayGetBoolean (
    pdfio_array_t *a,
    size_t n
);

pdfioArrayGetDate

Get a date value from an array.

time_t  pdfioArrayGetDate (
    pdfio_array_t *a,
    size_t n
);

pdfioArrayGetDict

Get a dictionary value from an array.

pdfio_dict_t * pdfioArrayGetDict (
    pdfio_array_t *a,
    size_t n
);

pdfioArrayGetName

Get a name value from an array.

const char * pdfioArrayGetName (
    pdfio_array_t *a,
    size_t n
);

pdfioArrayGetNumber

Get a number from an array.

double  pdfioArrayGetNumber (
    pdfio_array_t *a,
    size_t n
);

pdfioArrayGetObj

Get an indirect object reference from an array.

pdfio_obj_t * pdfioArrayGetObj (
    pdfio_array_t *a,
    size_t n
);

pdfioArrayGetSize

Get the length of an array.

size_t  pdfioArrayGetSize (
    pdfio_array_t *a
);

pdfioArrayGetString

Get a string value from an array.

const char * pdfioArrayGetString (
    pdfio_array_t *a,
    size_t n
);

pdfioArrayGetType

Get a value type from an array.

pdfio_valtype_t  pdfioArrayGetType (
    pdfio_array_t *a,
    size_t n
);

pdfioArrayRemove

Remove an array entry.

bool  pdfioArrayRemove (
    pdfio_array_t *a,
    size_t n
);

pdfioContentBeginMarked

Start marked content with an optional dictionary.

bool  pdfioContentBeginMarked (
    pdfio_stream_t *st,
    const char *tag,
    pdfio_dict_t *dict
);

This function starts an area of marked content with an optional dictionary. It must be paired with a call to the pdfioContentEndMarked function.

The "tag" argument specifies the tag name string for the content such as "P" for a paragraph, "H1" for a top-level heading, and so forth.  The "dict" argument specifies an optional dictionary of properties for the content such as the marked content identifier ("MCID") number.

Calling this function sets the "Marked" key in the "MarkInfo" dictionary of the document catalog.  The caller is responsible for setting the "StructTreeRoot" dictionary when creating marked content.

pdfioContentClip

Clip output to the current path.

bool  pdfioContentClip (
    pdfio_stream_t *st,
    bool even_odd
);

pdfioContentDrawImage

Draw an image object.

bool  pdfioContentDrawImage (
    pdfio_stream_t *st,
    const char *name,
    double x,
    double y,
    double width,
    double height
);

The object name must be part of the page dictionary resources, typically using the pdfioPageDictAddImage function.

pdfioContentEndMarked

End marked content.

bool  pdfioContentEndMarked (
    pdfio_stream_t *st
);

This function ends an area of marked content that was started using the pdfioContentBeginMarked function.

pdfioContentFill

Fill the current path.

bool  pdfioContentFill (
    pdfio_stream_t *st,
    bool even_odd
);

pdfioContentFillAndStroke

Fill and stroke the current path.

bool  pdfioContentFillAndStroke (
    pdfio_stream_t *st,
    bool even_odd
);

pdfioContentMatrixConcat

Concatenate a matrix to the current graphics
                              state.

bool  pdfioContentMatrixConcat (
    pdfio_stream_t *st,
    pdfio_matrix_t m
);

pdfioContentMatrixRotate

Rotate the current transform matrix.

bool  pdfioContentMatrixRotate (
    pdfio_stream_t *st,
    double degrees
);

pdfioContentMatrixScale

Scale the current transform matrix.

bool  pdfioContentMatrixScale (
    pdfio_stream_t *st,
    double sx,
    double sy
);

pdfioContentMatrixTranslate

Translate the current transform matrix.

bool  pdfioContentMatrixTranslate (
    pdfio_stream_t *st,
    double tx,
    double ty
);

pdfioContentPathClose

Close the current path.

bool  pdfioContentPathClose (
    pdfio_stream_t *st
);

pdfioContentPathCurve

Add a Bezier curve with two control points.

bool  pdfioContentPathCurve (
    pdfio_stream_t *st,
    double x1,
    double y1,
    double x2,
    double y2,
    double x3,
    double y3
);

pdfioContentPathCurve13

Add a Bezier curve with an initial control point.

bool  pdfioContentPathCurve13 (
    pdfio_stream_t *st,
    double x1,
    double y1,
    double x3,
    double y3
);

pdfioContentPathCurve23

Add a Bezier curve with a trailing control point.

bool  pdfioContentPathCurve23 (
    pdfio_stream_t *st,
    double x2,
    double y2,
    double x3,
    double y3
);

pdfioContentPathEnd

Clear the current path.

bool  pdfioContentPathEnd (
    pdfio_stream_t *st
);

pdfioContentPathLineTo

Add a straight line to the current path.

bool  pdfioContentPathLineTo (
    pdfio_stream_t *st,
    double x,
    double y
);

pdfioContentPathMoveTo

Start a new subpath.

bool  pdfioContentPathMoveTo (
    pdfio_stream_t *st,
    double x,
    double y
);

pdfioContentPathRect

Add a rectangle to the current path.

bool  pdfioContentPathRect (
    pdfio_stream_t *st,
    double x,
    double y,
    double width,
    double height
);

pdfioContentRestore

Restore a previous graphics state.

bool  pdfioContentRestore (
    pdfio_stream_t *st
);

pdfioContentSave

Save the current graphics state.

bool  pdfioContentSave (
    pdfio_stream_t *st
);

pdfioContentSetDashPattern

Set the stroke pattern.

bool  pdfioContentSetDashPattern (
    pdfio_stream_t *st,
    double phase,
    double on,
    double off
);

This function sets the stroke pattern when drawing lines.  If "on" and "off" are 0, a solid line is drawn.

pdfioContentSetFillColorDeviceCMYK

Set device CMYK fill color.

bool  pdfioContentSetFillColorDeviceCMYK (
    pdfio_stream_t *st,
    double c,
    double m,
    double y,
    double k
);

pdfioContentSetFillColorDeviceGray

Set the device gray fill color.

bool  pdfioContentSetFillColorDeviceGray (
    pdfio_stream_t *st,
    double g
);

pdfioContentSetFillColorDeviceRGB

Set the device RGB fill color.

bool  pdfioContentSetFillColorDeviceRGB (
    pdfio_stream_t *st,
    double r,
    double g,
    double b
);

pdfioContentSetFillColorGray

Set the calibrated gray fill color.

bool  pdfioContentSetFillColorGray (
    pdfio_stream_t *st,
    double g
);

pdfioContentSetFillColorRGB

Set the calibrated RGB fill color.

bool  pdfioContentSetFillColorRGB (
    pdfio_stream_t *st,
    double r,
    double g,
    double b
);

pdfioContentSetFillColorSpace

Set the fill colorspace.

bool  pdfioContentSetFillColorSpace (
    pdfio_stream_t *st,
    const char *name
);

pdfioContentSetFlatness

Set the flatness tolerance.

bool  pdfioContentSetFlatness (
    pdfio_stream_t *st,
    double flatness
);

pdfioContentSetLineCap

Set the line ends style.

bool  pdfioContentSetLineCap (
    pdfio_stream_t *st,
    pdfio_linecap_t lc
);

pdfioContentSetLineJoin

Set the line joining style.

bool  pdfioContentSetLineJoin (
    pdfio_stream_t *st,
    pdfio_linejoin_t lj
);

pdfioContentSetLineWidth

Set the line width.

bool  pdfioContentSetLineWidth (
    pdfio_stream_t *st,
    double width
);

pdfioContentSetMiterLimit

Set the miter limit.

bool  pdfioContentSetMiterLimit (
    pdfio_stream_t *st,
    double limit
);

pdfioContentSetStrokeColorDeviceCMYK

Set the device CMYK stroke color.

bool  pdfioContentSetStrokeColorDeviceCMYK (
    pdfio_stream_t *st,
    double c,
    double m,
    double y,
    double k
);

pdfioContentSetStrokeColorDeviceGray

Set the device gray stroke color.

bool  pdfioContentSetStrokeColorDeviceGray (
    pdfio_stream_t *st,
    double g
);

pdfioContentSetStrokeColorDeviceRGB

Set the device RGB stroke color.

bool  pdfioContentSetStrokeColorDeviceRGB (
    pdfio_stream_t *st,
    double r,
    double g,
    double b
);

pdfioContentSetStrokeColorGray

Set the calibrated gray stroke color.

bool  pdfioContentSetStrokeColorGray (
    pdfio_stream_t *st,
    double g
);

pdfioContentSetStrokeColorRGB

Set the calibrated RGB stroke color.

bool  pdfioContentSetStrokeColorRGB (
    pdfio_stream_t *st,
    double r,
    double g,
    double b
);

pdfioContentSetStrokeColorSpace

Set the stroke color space.

bool  pdfioContentSetStrokeColorSpace (
    pdfio_stream_t *st,
    const char *name
);

pdfioContentSetTextCharacterSpacing

Set the spacing between characters.

bool  pdfioContentSetTextCharacterSpacing (
    pdfio_stream_t *st,
    double spacing
);

pdfioContentSetTextFont

Set the text font and size.

bool  pdfioContentSetTextFont (
    pdfio_stream_t *st,
    const char *name,
    double size
);

pdfioContentSetTextLeading

Set text leading (line height) value.

bool  pdfioContentSetTextLeading (
    pdfio_stream_t *st,
    double leading
);

pdfioContentSetTextMatrix

Set the text transform matrix.

bool  pdfioContentSetTextMatrix (
    pdfio_stream_t *st,
    pdfio_matrix_t m
);

pdfioContentSetTextRenderingMode

Set the text rendering mode.

bool  pdfioContentSetTextRenderingMode (
    pdfio_stream_t *st,
    pdfio_textrendering_t mode
);

pdfioContentSetTextRise

Set the text baseline offset.

bool  pdfioContentSetTextRise (
    pdfio_stream_t *st,
    double rise
);

pdfioContentSetTextWordSpacing

Set the inter-word spacing.

bool  pdfioContentSetTextWordSpacing (
    pdfio_stream_t *st,
    double spacing
);

pdfioContentSetTextXScaling

Set the horizontal scaling value.

bool  pdfioContentSetTextXScaling (
    pdfio_stream_t *st,
    double percent
);

pdfioContentStroke

Stroke the current path.

bool  pdfioContentStroke (
    pdfio_stream_t *st
);

pdfioContentTextBegin

Begin a text block.

bool  pdfioContentTextBegin (
    pdfio_stream_t *st
);

pdfioContentTextEnd

End a text block.

bool  pdfioContentTextEnd (
    pdfio_stream_t *st
);

pdfioContentTextMeasure

Measure a text string and return its width.

double  pdfioContentTextMeasure (
    pdfio_obj_t *font,
    const char *s,
    double size
);

This function measures the given text string "s" and returns its width based on "size". The text string must always use the UTF-8 (Unicode) encoding but any control characters (such as newlines) are ignored.

pdfioContentTextMoveLine

Move to the next line and offset.

bool  pdfioContentTextMoveLine (
    pdfio_stream_t *st,
    double tx,
    double ty
);

pdfioContentTextMoveTo

Offset within the current line.

bool  pdfioContentTextMoveTo (
    pdfio_stream_t *st,
    double tx,
    double ty
);

pdfioContentTextNewLine

Move to the next line.

bool  pdfioContentTextNewLine (
    pdfio_stream_t *st
);

pdfioContentTextNewLineShow

Move to the next line and show text.

bool  pdfioContentTextNewLineShow (
    pdfio_stream_t *st,
    double ws,
    double cs,
    bool unicode,
    const char *s
);

This function moves to the next line and then shows some text with optional word and character spacing in a PDF content stream. The "unicode" argument specifies that the current font maps to full Unicode.  The "s" argument specifies a UTF-8 encoded string.

pdfioContentTextNewLineShowf

Show formatted text.

bool  pdfioContentTextNewLineShowf (
    pdfio_stream_t *st,
    double ws,
    double cs,
    bool unicode,
    const char *format,
    ...
);

This function moves to the next line and shows some formatted text with optional word and character spacing in a PDF content stream. The "unicode" argument specifies that the current font maps to full Unicode.  The "format" argument specifies a UTF-8 encoded printf-style format string.

pdfioContentTextShow

Show text.

bool  pdfioContentTextShow (
    pdfio_stream_t *st,
    bool unicode,
    const char *s
);

This function shows some text in a PDF content stream. The "unicode" argument specifies that the current font maps to full Unicode.  The "s" argument specifies a UTF-8 encoded string.

pdfioContentTextShowJustified

Show justified text.

bool  pdfioContentTextShowJustified (
    pdfio_stream_t *st,
    bool unicode,
    size_t num_fragments,
    const double *offsets,
    const char *const *fragments
);

This function shows some text in a PDF content stream. The "unicode" argument specifies that the current font maps to full Unicode.  The "fragments" argument specifies an array of UTF-8 encoded strings.

pdfioContentTextShowf

bool  pdfioContentTextShowf (
    pdfio_stream_t *st,
    bool unicode,
    const char *format,
    ...
);

pdfioDictClear

Remove a key/value pair from a dictionary.

bool  pdfioDictClear (
    pdfio_dict_t *dict,
    const char *key
);

pdfioDictCopy

Copy a dictionary to a PDF file.

pdfio_dict_t * pdfioDictCopy (
    pdfio_file_t *pdf,
    pdfio_dict_t *dict
);

pdfioDictCreate

Create a dictionary to hold key/value pairs.

pdfio_dict_t * pdfioDictCreate (
    pdfio_file_t *pdf
);

pdfioDictGetArray

Get a key array value from a dictionary.

pdfio_array_t * pdfioDictGetArray (
    pdfio_dict_t *dict,
    const char *key
);

pdfioDictGetBinary

Get a key binary string value from a dictionary.

unsigned char * pdfioDictGetBinary (
    pdfio_dict_t *dict,
    const char *key,
    size_t *length
);

pdfioDictGetBoolean

Get a key boolean value from a dictionary.

bool  pdfioDictGetBoolean (
    pdfio_dict_t *dict,
    const char *key
);

pdfioDictGetDate

Get a date value from a dictionary.

time_t  pdfioDictGetDate (
    pdfio_dict_t *dict,
    const char *key
);

pdfioDictGetDict

Get a key dictionary value from a dictionary.

pdfio_dict_t * pdfioDictGetDict (
    pdfio_dict_t *dict,
    const char *key
);

pdfioDictGetKey

Get the key for the specified pair.

const char * pdfioDictGetKey (
    pdfio_dict_t *dict,
    size_t n
);

pdfioDictGetName

Get a key name value from a dictionary.

const char * pdfioDictGetName (
    pdfio_dict_t *dict,
    const char *key
);

pdfioDictGetNumPairs

Get the number of key/value pairs in a dictionary.

size_t  pdfioDictGetNumPairs (
    pdfio_dict_t *dict
);

pdfioDictGetNumber

Get a key number value from a dictionary.

double  pdfioDictGetNumber (
    pdfio_dict_t *dict,
    const char *key
);

pdfioDictGetObj

Get a key indirect object value from a dictionary.

pdfio_obj_t * pdfioDictGetObj (
    pdfio_dict_t *dict,
    const char *key
);

pdfioDictGetRect

Get a key rectangle value from a dictionary.

pdfio_rect_t * pdfioDictGetRect (
    pdfio_dict_t *dict,
    const char *key,
    pdfio_rect_t *rect
);

pdfioDictGetString

Get a key string value from a dictionary.

const char * pdfioDictGetString (
    pdfio_dict_t *dict,
    const char *key
);

pdfioDictGetType

Get a key value type from a dictionary.

pdfio_valtype_t  pdfioDictGetType (
    pdfio_dict_t *dict,
    const char *key
);

pdfioDictIterateKeys

Iterate the keys in a dictionary.

void pdfioDictIterateKeys (
    pdfio_dict_t *dict,
    pdfio_dict_cb_t cb,
    void *cb_data
);

This function iterates the keys in a dictionary, calling the supplied function "cb":

    bool
    my_dict_cb(pdfio_dict_t *dict, const char *key, void *cb_data)
    {
    ... "key" contains the dictionary key ...
    ... return true to continue or false to stop ...
    }

The iteration continues as long as the callback returns true or all keys have been iterated.

pdfioDictSetArray

Set a key array in a dictionary.

bool  pdfioDictSetArray (
    pdfio_dict_t *dict,
    const char *key,
    pdfio_array_t *value
);

pdfioDictSetBinary

Set a key binary string in a dictionary.

bool  pdfioDictSetBinary (
    pdfio_dict_t *dict,
    const char *key,
    const unsigned char *value,
    size_t valuelen
);

pdfioDictSetBoolean

Set a key boolean in a dictionary.

bool  pdfioDictSetBoolean (
    pdfio_dict_t *dict,
    const char *key,
    bool value
);

pdfioDictSetDate

Set a date value in a dictionary.

bool  pdfioDictSetDate (
    pdfio_dict_t *dict,
    const char *key,
    time_t value
);

pdfioDictSetDict

Set a key dictionary in a dictionary.

bool  pdfioDictSetDict (
    pdfio_dict_t *dict,
    const char *key,
    pdfio_dict_t *value
);

pdfioDictSetName

Set a key name in a dictionary.

bool  pdfioDictSetName (
    pdfio_dict_t *dict,
    const char *key,
    const char *value
);

pdfioDictSetNull

Set a key null in a dictionary.

bool  pdfioDictSetNull (
    pdfio_dict_t *dict,
    const char *key
);

pdfioDictSetNumber

Set a key number in a dictionary.

bool  pdfioDictSetNumber (
    pdfio_dict_t *dict,
    const char *key,
    double value
);

pdfioDictSetObj

Set a key indirect object reference in a dictionary.

bool  pdfioDictSetObj (
    pdfio_dict_t *dict,
    const char *key,
    pdfio_obj_t *value
);

pdfioDictSetRect

Set a key rectangle in a dictionary.

bool  pdfioDictSetRect (
    pdfio_dict_t *dict,
    const char *key,
    pdfio_rect_t *value
);

pdfioDictSetString

Set a key literal string in a dictionary.

bool  pdfioDictSetString (
    pdfio_dict_t *dict,
    const char *key,
    const char *value
);

pdfioDictSetStringf

Set a key formatted string in a dictionary.

bool  pdfioDictSetStringf (
    pdfio_dict_t *dict,
    const char *key,
    const char *format,
    ...
);

pdfioFileAddOutputIntent

Add an OutputIntent to a file.

void pdfioFileAddOutputIntent (
    pdfio_file_t *pdf,
    const char *subtype,
    const char *condition,
    const char *cond_id,
    const char *reg_name,
    const char *info,
    pdfio_obj_t *profile
);

This function adds an OutputIntent dictionary to the PDF file catalog. The "subtype" argument specifies the intent subtype and is typically "GTS_PDFX" for PDF/X, "GTS_PDFA1" for PDF/A, or "ISO_PDFE1" for PDF/E. Passing NULL defaults the subtype to "GTS_PDFA1".

The "condition" argument specifies a short name for the output intent, while the "info" argument specifies a longer description for the output intent. Both can be NULL to omit this information.

The "cond_id" argument specifies a unique identifier such as a registration ("CGATS001") or color space name ("sRGB").  The "reg_name" argument provides a URL for the identifier.

The "profile" argument specifies an ICC profile object for the output condition.  If NULL, the PDF consumer will attempt to look up the correct profile using the "cond_id" value.

pdfioFileClose

Close a PDF file and free all memory used for it.

bool  pdfioFileClose (
    pdfio_file_t *pdf
);

pdfioFileCreate

Create a PDF file.

pdfio_file_t * pdfioFileCreate (
    const char *filename,
    const char *version,
    pdfio_rect_t *media_box,
    pdfio_rect_t *crop_box,
    pdfio_error_cb_t error_cb,
    void *error_cbdata
);

This function creates a new PDF file.  The "filename" argument specifies the name of the PDF file to create.

The "version" argument specifies the PDF version number for the file or NULL for the default ("2.0").  The following values are recognized:

  • "1.3", "1.4", "1.5", "1.6", "1.7", "2.0": Generic PDF files of the
     specified versions.
  • "PCLm-1.0": The PCLm (raster) subset of PDF supported by some printers.
  • "PDF/A-1a": PDF/A-1a:2005
  • "PDF/A-1b": PDF/A-1b:2005
  • "PDF/A-2a": PDF/A-2a:2011
  • "PDF/A-2b": PDF/A-2b:2011
  • "PDF/A-2u": PDF/A-2u:2011
  • "PDF/A-3a": PDF/A-3a:2012
  • "PDF/A-3b": PDF/A-3b:2012
  • "PDF/A-3u": PDF/A-3u:2012
  • "PDF/A-4": PDF/A-4:2020

The "media_box" and "crop_box" arguments specify the default MediaBox and CropBox for pages in the PDF file - if NULL then a default "Universal" size of 8.27x11in (the intersection of US Letter and ISO A4) is used.

The "error_cb" and "error_cbdata" arguments specify an error handler callback and its data pointer - if NULL then the default error handler is used that writes error and warning messages to stderr.  The error handler callback should return true to continue writing the PDF file or false to stop. For example:

    bool
    my_error_cb(pdfio_file_t *pdf, const char *message, void *data)
    {
      (void)data; /* Not using data pointer in this callback */
    
      fprintf(stderr, "%s: %sn", pdfioFileGetName(pdf), message);
    
      /* Return true to continue on warning messages, false otherwise... */
      return (!strncmp(message, "WARNING:", 8));
    }

pdfioFileCreateArrayObj

Create a new object in a PDF file containing an array.

pdfio_obj_t * pdfioFileCreateArrayObj (
    pdfio_file_t *pdf,
    pdfio_array_t *array
);

This function creates a new object with an array value in a PDF file. You must call pdfioObjClose to write the object to the file.

pdfioFileCreateFontObjFromBase

Create one of the base 14 PDF fonts.

pdfio_obj_t * pdfioFileCreateFontObjFromBase (
    pdfio_file_t *pdf,
    const char *name
);

This function creates one of the base 14 PDF fonts. The "name" parameter specifies the font nane:

  • "Courier"
  • "Courier-Bold"
  • "Courier-BoldItalic"
  • "Courier-Italic"
  • "Helvetica"
  • "Helvetica-Bold"
  • "Helvetica-BoldOblique"
  • "Helvetica-Oblique"
  • "Symbol"
  • "Times-Bold"
  • "Times-BoldItalic"
  • "Times-Italic"
  • "Times-Roman"
  • "ZapfDingbats"

Aside from "Symbol" and "Zapf-Dingbats", Base fonts use the Windows CP1252 (ISO-8859-1 with additional characters such as the Euro symbol) subset of Unicode.

5

Note: This function cannot be used when producing PDF/A files.

pdfioFileCreateFontObjFromData

Add a font in memory to a PDF file.

pdfio_obj_t * pdfioFileCreateFontObjFromData (
    pdfio_file_t *pdf,
    const void *data,
    size_t datasize,
    bool unicode
);

This function embeds TrueType/OpenType font data into a PDF file.  The "unicode" parameter controls whether the font is encoded for two-byte characters (potentially full Unicode, but more typically a subset) or to only support the Windows CP1252 (ISO-8859-1 with additional characters such as the Euro symbol) subset of Unicode.

pdfioFileCreateFontObjFromFile

Add a font file to a PDF file.

pdfio_obj_t * pdfioFileCreateFontObjFromFile (
    pdfio_file_t *pdf,
    const char *filename,
    bool unicode
);

This function embeds a TrueType/OpenType font file into a PDF file.  The "unicode" parameter controls whether the font is encoded for two-byte characters (potentially full Unicode, but more typically a subset) or to only support the Windows CP1252 (ISO-8859-1 with additional characters such as the Euro symbol) subset of Unicode.

pdfioFileCreateICCObjFromData

Add ICC profile data to a PDF file.

pdfio_obj_t * pdfioFileCreateICCObjFromData (
    pdfio_file_t *pdf,
    const unsigned char *data,
    size_t datalen,
    size_t num_colors
);

pdfioFileCreateICCObjFromFile

Add an ICC profile file to a PDF file.

pdfio_obj_t * pdfioFileCreateICCObjFromFile (
    pdfio_file_t *pdf,
    const char *filename,
    size_t num_colors
);

pdfioFileCreateImageObjFromData

Add image object(s) to a PDF file from memory.

pdfio_obj_t * pdfioFileCreateImageObjFromData (
    pdfio_file_t *pdf,
    const unsigned char *data,
    size_t width,
    size_t height,
    size_t num_colors,
    pdfio_array_t *color_data,
    bool alpha,
    bool interpolate
);

This function creates image object(s) in a PDF file from a data buffer in memory.  The "data" parameter points to the image data as 8-bit color values. The "width" and "height" parameters specify the image dimensions.  The "num_colors" parameter specifies the number of color components (1 for grayscale, 3 for RGB, and 4 for CMYK) and the "alpha" parameter specifies whether each color tuple is followed by an alpha value.  The "color_data" parameter specifies an optional color space array for the image - if NULL, the image is encoded in the corresponding device color space.  The "interpolate" parameter specifies whether to interpolate when scaling the image on the page.

5

Note: When creating an image object with alpha, a second image object is

5

created to hold the "soft mask" data for the primary image.  PDF/A-1

5

files do not support alpha-based transparency.

pdfioFileCreateImageObjFromFile

Add an image object to a PDF file from a file.

pdfio_obj_t * pdfioFileCreateImageObjFromFile (
    pdfio_file_t *pdf,
    const char *filename,
    bool interpolate
);

This function creates an image object in a PDF file from a JPEG or PNG file. The "filename" parameter specifies the name of the JPEG or PNG file, while the "interpolate" parameter specifies whether to interpolate when scaling the image on the page.

5

Note: PNG files containing transparency cannot be used when producing

5

PDF/A files.

pdfioFileCreateNameObj

Create a new object in a PDF file containing a name.

pdfio_obj_t * pdfioFileCreateNameObj (
    pdfio_file_t *pdf,
    const char *name
);

This function creates a new object with a name value in a PDF file. You must call pdfioObjClose to write the object to the file.

pdfioFileCreateNumberObj

Create a new object in a PDF file containing a number.

pdfio_obj_t * pdfioFileCreateNumberObj (
    pdfio_file_t *pdf,
    double number
);

This function creates a new object with a number value in a PDF file. You must call pdfioObjClose to write the object to the file.

pdfioFileCreateObj

Create a new object in a PDF file.

pdfio_obj_t * pdfioFileCreateObj (
    pdfio_file_t *pdf,
    pdfio_dict_t *dict
);

pdfioFileCreateOutput

Create a PDF file through an output callback.

pdfio_file_t * pdfioFileCreateOutput (
    pdfio_output_cb_t output_cb,
    void *output_cbdata,
    const char *version,
    pdfio_rect_t *media_box,
    pdfio_rect_t *crop_box,
    pdfio_error_cb_t error_cb,
    void *error_cbdata
);

This function creates a new PDF file that is streamed though an output callback.  The "output_cb" and "output_cbdata" arguments specify the output callback and its data pointer which is called whenever data needs to be written:

    ssize_t
    output_cb(void *output_cbdata, const void *buffer, size_t bytes)
    {
      // Write buffer to output and return the number of bytes written
    }

The "version" argument specifies the PDF version number for the file or NULL for the default ("2.0").  The following values are recognized:

  • "1.3", "1.4", "1.5", "1.6", "1.7", "2.0": Generic PDF files of the
     specified versions.
  • "PDF/A-1a": PDF/A-1a:2005
  • "PDF/A-1b": PDF/A-1b:2005
  • "PDF/A-2a": PDF/A-2a:2011
  • "PDF/A-2b": PDF/A-2b:2011
  • "PDF/A-2u": PDF/A-2u:2011
  • "PDF/A-3a": PDF/A-3a:2012
  • "PDF/A-3b": PDF/A-3b:2012
  • "PDF/A-3u": PDF/A-3u:2012
  • "PDF/A-4": PDF/A-4:2020

Unlike pdfioFileCreate and pdfioFileCreateTemporary, it is generally not safe to pass the "PCLm-1.0" version string.

The "media_box" and "crop_box" arguments specify the default MediaBox and CropBox for pages in the PDF file - if NULL then a default "Universal" size of 8.27x11in (the intersection of US Letter and ISO A4) is used.

The "error_cb" and "error_cbdata" arguments specify an error handler callback and its data pointer - if NULL then the default error handler is used that writes error and warning messages to stderr.  The error handler callback should return true to continue writing the PDF file or false to stop. For example:

    bool
    my_error_cb(pdfio_file_t *pdf, const char *message, void *data)
    {
      (void)data; /* Not using data pointer in this callback */
    
      fprintf(stderr, "%s: %sn", pdfioFileGetName(pdf), message);
    
      /* Return true to continue on warning messages, false otherwise... */
      return (!strncmp(message, "WARNING:", 8));
    }
5

Note: Files created using this API are slightly larger than those

5

created using the pdfioFileCreate function since stream lengths are

5

stored as indirect object references.

pdfioFileCreatePage

Create a page in a PDF file.

pdfio_stream_t * pdfioFileCreatePage (
    pdfio_file_t *pdf,
    pdfio_dict_t *dict
);

pdfioFileCreateStringObj

Create a new object in a PDF file containing a string.

pdfio_obj_t * pdfioFileCreateStringObj (
    pdfio_file_t *pdf,
    const char *string
);

This function creates a new object with a string value in a PDF file. You must call pdfioObjClose to write the object to the file.

pdfioFileCreateTemporary

pdfio_file_t * pdfioFileCreateTemporary (
    char *buffer,
    size_t bufsize,
    const char *version,
    pdfio_rect_t *media_box,
    pdfio_rect_t *crop_box,
    pdfio_error_cb_t error_cb,
    void *error_cbdata
);

pdfioFileFindObj

Find an object using its object number.

pdfio_obj_t * pdfioFileFindObj (
    pdfio_file_t *pdf,
    size_t number
);

This differs from pdfioFileGetObj which takes an index into the list of objects while this function takes the object number.

pdfioFileGetAuthor

Get the author for a PDF file.

const char * pdfioFileGetAuthor (
    pdfio_file_t *pdf
);

pdfioFileGetCatalog

Get the document catalog dictionary.

pdfio_dict_t * pdfioFileGetCatalog (
    pdfio_file_t *pdf
);

pdfioFileGetCreationDate

Get the creation date for a PDF file.

time_t  pdfioFileGetCreationDate (
    pdfio_file_t *pdf
);

pdfioFileGetCreator

Get the creator string for a PDF file.

const char * pdfioFileGetCreator (
    pdfio_file_t *pdf
);

pdfioFileGetID

Get the PDF file's ID strings.

pdfio_array_t * pdfioFileGetID (
    pdfio_file_t *pdf
);

pdfioFileGetKeywords

Get the keywords for a PDF file.

const char * pdfioFileGetKeywords (
    pdfio_file_t *pdf
);

pdfioFileGetLanguage

Get the language metadata for a PDF file.

const char * pdfioFileGetLanguage (
    pdfio_file_t *pdf
);

This function gets the (primary/default) language metadata, if any, for a PDF file.  The returned string is an IETF BCP 47 language tag of the form "lang-REGION".  For example, the string "en-CA" specifies Canadian English and the string "fr-CA" specifies Canadian French.

pdfioFileGetModificationDate

Get the most recent modification date for a PDF file.

time_t  pdfioFileGetModificationDate (
    pdfio_file_t *pdf
);

pdfioFileGetName

Get a PDF's filename.

const char * pdfioFileGetName (
    pdfio_file_t *pdf
);

pdfioFileGetNumObjs

Get the number of objects in a PDF file.

size_t  pdfioFileGetNumObjs (
    pdfio_file_t *pdf
);

pdfioFileGetNumPages

Get the number of pages in a PDF file.

size_t  pdfioFileGetNumPages (
    pdfio_file_t *pdf
);

pdfioFileGetObj

Get an object from a PDF file.

pdfio_obj_t * pdfioFileGetObj (
    pdfio_file_t *pdf,
    size_t n
);

pdfioFileGetPage

Get a page object from a PDF file.

pdfio_obj_t * pdfioFileGetPage (
    pdfio_file_t *pdf,
    size_t n
);

pdfioFileGetPermissions

Get the access permissions of a PDF file.

pdfio_permission_t  pdfioFileGetPermissions (
    pdfio_file_t *pdf,
    pdfio_encryption_t *encryption
);

This function returns the access permissions of a PDF file and (optionally) the type of encryption that has been used.

pdfioFileGetProducer

Get the producer string for a PDF file.

const char * pdfioFileGetProducer (
    pdfio_file_t *pdf
);

pdfioFileGetSubject

Get the subject for a PDF file.

const char * pdfioFileGetSubject (
    pdfio_file_t *pdf
);

pdfioFileGetTitle

Get the title for a PDF file.

const char * pdfioFileGetTitle (
    pdfio_file_t *pdf
);

pdfioFileGetVersion

Get the PDF version number for a PDF file.

const char * pdfioFileGetVersion (
    pdfio_file_t *pdf
);

pdfioFileOpen

Open a PDF file for reading.

pdfio_file_t * pdfioFileOpen (
    const char *filename,
    pdfio_password_cb_t password_cb,
    void *password_cbdata,
    pdfio_error_cb_t error_cb,
    void *error_cbdata
);

This function opens an existing PDF file.  The "filename" argument specifies the name of the PDF file to create.

The "password_cb" and "password_cbdata" arguments specify a password callback and its data pointer for PDF files that use one of the standard Adobe "security" handlers.  The callback returns a password string or NULL to cancel the open.  If NULL is specified for the callback function and the PDF file requires a password, the open will always fail.

The "error_cb" and "error_cbdata" arguments specify an error handler callback and its data pointer - if NULL then the default error handler is used that writes error and warning messages to stderr.  The error handler callback should return true to continue reading the PDF file or false to stop. For example:

    bool
    my_error_cb(pdfio_file_t *pdf, const char *message, void *data)
    {
      (void)data; /* Not using data pointer in this callback */
    
      fprintf(stderr, "%s: %sn", pdfioFileGetName(pdf), message);
    
      /* Return true to continue on warning messages, false otherwise... */
      return (!strncmp(message, "WARNING:", 8));
    }
5

Note: Error messages starting with "WARNING:" are actually warning

5

messages - the callback should normally return true to allow PDFio to

5

try to resolve the issue.  In addition, some errors are unrecoverable and

5

ignore the return value of the error callback.

pdfioFileSetAuthor

Set the author for a PDF file.

void pdfioFileSetAuthor (
    pdfio_file_t *pdf,
    const char *value
);

pdfioFileSetCreationDate

Set the creation date for a PDF file.

void pdfioFileSetCreationDate (
    pdfio_file_t *pdf,
    time_t value
);

pdfioFileSetCreator

Set the creator string for a PDF file.

void pdfioFileSetCreator (
    pdfio_file_t *pdf,
    const char *value
);

pdfioFileSetKeywords

Set the keywords string for a PDF file.

void pdfioFileSetKeywords (
    pdfio_file_t *pdf,
    const char *value
);

pdfioFileSetLanguage

Set the language metadata for a PDF file.

void pdfioFileSetLanguage (
    pdfio_file_t *pdf,
    const char *value
);

This function sets the (primary/default) language metadata for a PDF file. The "value" argument is an IETF BCP 47 language tag string of the form "lang-REGION".  For example, the string "en-CA" specifies Canadian English and the string "fr-CA" specifies Canadian French.

pdfioFileSetModificationDate

Set the modification date for a PDF file.

void pdfioFileSetModificationDate (
    pdfio_file_t *pdf,
    time_t value
);

pdfioFileSetPermissions

Set the PDF permissions, encryption mode, and passwords.

bool  pdfioFileSetPermissions (
    pdfio_file_t *pdf,
    pdfio_permission_t permissions,
    pdfio_encryption_t encryption,
    const char *owner_password,
    const char *user_password
);

This function sets the PDF usage permissions, encryption mode, and passwords.

5

Note: This function must be called before creating or copying any

5

objects.  Due to fundamental limitations in the PDF format, PDF encryption

5

offers little protection from disclosure.  Permissions are not enforced in

5

any meaningful way.

pdfioFileSetSubject

Set the subject for a PDF file.

void pdfioFileSetSubject (
    pdfio_file_t *pdf,
    const char *value
);

pdfioFileSetTitle

Set the title for a PDF file.

void pdfioFileSetTitle (
    pdfio_file_t *pdf,
    const char *value
);

pdfioImageGetBytesPerLine

Get the number of bytes to read for each line.

size_t  pdfioImageGetBytesPerLine (
    pdfio_obj_t *obj
);

pdfioImageGetHeight

Get the height of an image object.

double  pdfioImageGetHeight (
    pdfio_obj_t *obj
);

pdfioImageGetWidth

Get the width of an image object.

double  pdfioImageGetWidth (
    pdfio_obj_t *obj
);

pdfioObjClose

Close an object, writing any data as needed to the PDF
                   file.

bool  pdfioObjClose (
    pdfio_obj_t *obj
);

pdfioObjCopy

Copy an object to another PDF file.

pdfio_obj_t * pdfioObjCopy (
    pdfio_file_t *pdf,
    pdfio_obj_t *srcobj
);

pdfioObjCreateStream

Create an object (data) stream for writing.

pdfio_stream_t * pdfioObjCreateStream (
    pdfio_obj_t *obj,
    pdfio_filter_t filter
);

pdfioObjGetArray

Get the array associated with an object.

pdfio_array_t * pdfioObjGetArray (
    pdfio_obj_t *obj
);

pdfioObjGetDict

Get the dictionary associated with an object.

pdfio_dict_t * pdfioObjGetDict (
    pdfio_obj_t *obj
);

pdfioObjGetGeneration

Get the object's generation number.

unsigned short  pdfioObjGetGeneration (
    pdfio_obj_t *obj
);

pdfioObjGetLength

Get the length of the object's (data) stream.

size_t  pdfioObjGetLength (
    pdfio_obj_t *obj
);

pdfioObjGetName

Get the name value associated with an object.

const char * pdfioObjGetName (
    pdfio_obj_t *obj
);

pdfioObjGetNumber

Get the object's number.

size_t  pdfioObjGetNumber (
    pdfio_obj_t *obj
);

pdfioObjGetSubtype

Get an object's subtype.

const char * pdfioObjGetSubtype (
    pdfio_obj_t *obj
);

This function returns an object's PDF subtype name, if any.  Common subtype names include:

  • "CIDFontType0": A CID Type0 font
  • "CIDFontType2": A CID TrueType font
  • "Image": An image or image mask
  • "Form": A fillable form
  • "OpenType": An OpenType font
  • "Type0": A composite font
  • "Type1": A PostScript Type1 font
  • "Type3": A PDF Type3 font
  • "TrueType": A TrueType font</li> </ul>

pdfioObjGetType

Get an object's type.

const char * pdfioObjGetType (
    pdfio_obj_t *obj
);

This function returns an object's PDF type name, if any. Common type names include:

  • "CMap": A character map for composite fonts
  • "Font": An embedded font (pdfioObjGetSubtype will tell you the
     font format)
  • "FontDescriptor": A font descriptor
  • "Page": A (visible) page
  • "Pages": A page tree node
  • "Template": An invisible template page
  • "XObject": An image, image mask, or form (pdfioObjGetSubtype will
     tell you which)</li> </ul>

pdfioObjOpenStream

Open an object's (data) stream for reading.

pdfio_stream_t * pdfioObjOpenStream (
    pdfio_obj_t *obj,
    bool decode
);

pdfioPageCopy

Copy a page to a PDF file.

bool  pdfioPageCopy (
    pdfio_file_t *pdf,
    pdfio_obj_t *srcpage
);

pdfioPageDictAddColorSpace

Add a color space to the page dictionary.

bool  pdfioPageDictAddColorSpace (
    pdfio_dict_t *dict,
    const char *name,
    pdfio_array_t *data
);

This function adds a named color space to the page dictionary.

The names "DefaultCMYK", "DefaultGray", and "DefaultRGB" specify the default device color space used for the page.

The "data" array contains a calibrated, indexed, or ICC-based color space array that was created using the pdfioArrayCreateCalibratedColorFromMatrix, pdfioArrayCreateCalibratedColorFromPrimaries, pdfioArrayCreateICCBasedColor, or pdfioArrayCreateIndexedColor functions.

pdfioPageDictAddFont

Add a font object to the page dictionary.

bool  pdfioPageDictAddFont (
    pdfio_dict_t *dict,
    const char *name,
    pdfio_obj_t *obj
);

pdfioPageDictAddImage

Add an image object to the page dictionary.

bool  pdfioPageDictAddImage (
    pdfio_dict_t *dict,
    const char *name,
    pdfio_obj_t *obj
);

pdfioPageGetNumStreams

Get the number of content streams for a page object.

size_t  pdfioPageGetNumStreams (
    pdfio_obj_t *page
);

pdfioPageOpenStream

Open a content stream for a page.

pdfio_stream_t * pdfioPageOpenStream (
    pdfio_obj_t *page,
    size_t n,
    bool decode
);

pdfioStreamClose

Close a (data) stream in a PDF file.

bool  pdfioStreamClose (
    pdfio_stream_t *st
);

pdfioStreamConsume

Consume bytes from the stream.

bool  pdfioStreamConsume (
    pdfio_stream_t *st,
    size_t bytes
);

pdfioStreamGetToken

Read a single PDF token from a stream.

bool  pdfioStreamGetToken (
    pdfio_stream_t *st,
    char *buffer,
    size_t bufsize
);

This function reads a single PDF token from a stream, skipping all whitespace and comments.  Operator tokens, boolean values, and numbers are returned as-is in the provided string buffer.  String values start with the opening parenthesis ('(') but have all escaping resolved and the terminating parenthesis removed.  Hexadecimal string values start with the opening angle bracket ('<') and have all whitespace and the terminating angle bracket removed.

pdfioStreamPeek

Peek at data in a stream.

ssize_t  pdfioStreamPeek (
    pdfio_stream_t *st,
    void *buffer,
    size_t bytes
);

pdfioStreamPrintf

Write a formatted string to a stream.

bool  pdfioStreamPrintf (
    pdfio_stream_t *st,
    const char *format,
    ...
);

This function writes a formatted string to a stream.  In addition to the standard printf format characters, you can use "%H" to format a HTML/XML string value, "%N" to format a PDF name value ("/Name"), and "%S" to format a PDF string ("(String)") value.

pdfioStreamPutChar

Write a single character to a stream.

bool  pdfioStreamPutChar (
    pdfio_stream_t *st,
    int ch
);

pdfioStreamPuts

Write a literal string to a stream.

bool  pdfioStreamPuts (
    pdfio_stream_t *st,
    const char *s
);

pdfioStreamRead

Read data from a stream.

ssize_t  pdfioStreamRead (
    pdfio_stream_t *st,
    void *buffer,
    size_t bytes
);

This function reads data from a stream.  When reading decoded image data from a stream, you must read whole scanlines.  The pdfioImageGetBytesPerLine function can be used to determine the proper read length.

pdfioStreamWrite

Write data to a stream.

bool  pdfioStreamWrite (
    pdfio_stream_t *st,
    const void *buffer,
    size_t bytes
);

pdfioStringCreate

Create a durable literal string.

char * pdfioStringCreate (
    pdfio_file_t *pdf,
    const char *s
);

This function creates a literal string associated with the PDF file "pdf".  The "s" string points to a nul-terminated C string.

NULL is returned on error, otherwise a char * that is valid until pdfioFileClose is called.

pdfioStringCreatef

Create a durable formatted string.

char * pdfioStringCreatef (
    pdfio_file_t *pdf,
    const char *format,
    ...
);

This function creates a formatted string associated with the PDF file "pdf".  The "format" string contains printf-style format characters.

NULL is returned on error, otherwise a char * that is valid until pdfioFileClose is called.

Structures

pdfio_rect_s

PDF rectangle

struct pdfio_rect_s
{
  double x1;
  double x2;
  double y1;
  double y2;
};

Types

pdfio_array_t

Array of PDF values

typedef struct _pdfio_array_s pdfio_array_t;

pdfio_cs_t

Standard color spaces

typedef enum pdfio_cs_e pdfio_cs_t;

pdfio_dict_cb_t

Dictionary iterator callback

typedef bool(*)(pdfio_dict_t *dict, const char *key, void *cb_data) pdfio_dict_cb_t;

pdfio_dict_t

Key/value dictionary

typedef struct _pdfio_dict_s pdfio_dict_t;

pdfio_encryption_t

PDF encryption modes

typedef enum pdfio_encryption_e pdfio_encryption_t;

pdfio_error_cb_t

Error callback

typedef bool(*)(pdfio_file_t *pdf, const char *message, void *data) pdfio_error_cb_t;

pdfio_file_t

PDF file

typedef struct _pdfio_file_s pdfio_file_t;

pdfio_filter_t

Compression/decompression filters for streams

typedef enum pdfio_filter_e pdfio_filter_t;

pdfio_linecap_t

Line capping modes

typedef enum pdfio_linecap_e pdfio_linecap_t;

pdfio_linejoin_t

Line joining modes

typedef enum pdfio_linejoin_e pdfio_linejoin_t;

pdfio_matrix_t[3][2]

Transform matrix

typedef double pdfio_matrix_t[3][2];

pdfio_obj_t

Numbered object in PDF file

typedef struct _pdfio_obj_s pdfio_obj_t;

pdfio_output_cb_t

Output callback for pdfioFileCreateOutput

typedef ssize_t(*)(void *ctx const void *data size_t datalen) pdfio_output_cb_t;

pdfio_password_cb_t

Password callback for pdfioFileOpen

typedef const char *(*)(void *data const char *filename) pdfio_password_cb_t;

pdfio_permission_t

PDF permission bitfield

typedef int pdfio_permission_t;

pdfio_rect_t

PDF rectangle

typedef struct pdfio_rect_s pdfio_rect_t;

pdfio_stream_t

Object data stream in PDF file

typedef struct _pdfio_stream_s pdfio_stream_t;

pdfio_textrendering_t

Text rendering modes

typedef enum pdfio_textrendering_e pdfio_textrendering_t;

pdfio_valtype_t

PDF value types

typedef enum pdfio_valtype_e pdfio_valtype_t;

Author

Michael R Sweet

Info

2025-10-05 pdf read/write library