distcc man page

distcc — distributed C/C++/ObjC compiler with distcc-pump extensions

Synopsis

distcc <compiler> [COMPILER OPTIONS]

distcc [COMPILER OPTIONS]

<compiler> [COMPILER OPTIONS]

distcc [DISTCC OPTIONS]

Description

distcc distributes compilation of C code across several machines on a network.  distcc should always generate the same results as a local compile, it is simple to install and use, and it is often much faster than a local compile.

This version incorporates plain distcc as well as an enhancement called pump mode or distcc-pump.

For each job, distcc in plain mode sends the complete preprocessed source code and compiler arguments across the network from the client to a compilation server.  In pump mode, distcc sends the source code and recursively included header files (excluding those from the default system header directories), so that both preprocessing and compilation can take place on the compilation servers. This speeds up the delivery of compilations by up to an order of magnitude over plain distcc.

Compilation is driven by a client machine, which is typically the developer's workstation or laptop.  The distcc client runs on this machine, as does make, the preprocessor (if distcc's pump mode is not used), the linker, and other stages of the build process.  Any number of volunteer machines act as compilation servers and help the client to build the program, by running the distccd(1) daemon, C compiler and assembler as required.

distcc can run across either TCP sockets (on port 3632 by default), or through a tunnel command such as ssh(1).  For TCP connections the volunteers must run the distccd(1) daemon either directly or from inetd. For SSH connections distccd must be installed but should not be listening for connections.  

TCP connections should only be used on secure networks because there is no user authentication or protection of source or object code.  SSH connections are typically 25% slower because of processor overhead for encryption, although this can vary greatly depending on CPUs, network and the program being built.

distcc is intended to be used with GNU Make's -j option, which runs several compiler processes concurrently.  distcc spreads the jobs across both local and remote CPUs.  Because distcc is able to distribute most of the work across the network, a higher concurrency level can be used than for local builds.  As a rule of thumb,  the -j value should be set to about twice the total number of available server CPUs but subject to client limitations.  This setting allows for maximal interleaving of tasks being blocked waiting for disk or network IO. Note that distcc can also work with other build control tools, such as SCons, where similar concurrency settings must be adjusted.

The -j setting, especially for large values of -j, must take into account the CPU load on the client.  Additional measures may be needed to curtail the client load. For example, concurrent linking should be severely curtailed using auxiliary locks.  The effect of other build activity, such as Java compilation when building mixed code, should be considered.  The --localslots_cpp parameter is by default set to 16. This limits the number of concurrent processes that do preprocessing in  plain distcc (non-pump) mode. Therefore, larger -j values than 16 may be used without overloading a single-CPU client due to preprocessing.  Such large values may speed up parts of the build that do not involve C compilations, but they may not be useful to distcc efficiency in plain mode.

In contrast, using pump mode and say 40 servers, a setting of -j80 or larger may be appropriate even for single-CPU clients.

It is strongly recommended that you install the same compiler version on all machines participating in a build.  Incompatible compilers may cause mysterious compile or link failures.

Quickstart

1

For each machine, download distcc, unpack, and install.

2

On each of the servers, run distccd --daemon with --allow options to restrict access.

3

Put the names of the servers in your environment:
$ export DISTCC_HOSTS='localhost red green blue'

4

Build!
$ make -j8 CC=distcc

Quickstart for Distcc-Pump Mode

Proceed as above, but in Step 3, specify that the remote hosts are to carry the burden of preprocessing and that the files sent over the network should be compressed:

$ export DISTCC_HOSTS='--randomize localhost red,cpp,lzo green,cpp,lzo blue,cpp,lzo'

The --randomize option enforces a uniform usage of compile servers.  While you will get some benefit from distcc's pump mode with only a few servers, you get increasing benefit with more server CPUs (up to the hundreds!). Wrap your build inside the pump command, here assuming 10 servers:

$ pump make -j20 CC=distcc

Quickstart for Distcc-Gssapi Mode

Proceed as per the Quickstart but in Step 3, specify that the remote hosts are to mutually authenticate with the client:

$ export DISTCC_HOSTS='--randomize localhost red,auth green,auth blue,auth'

If distccd runs under a specific principal name then execute the following command prior to step 4:

export DISTCC_PRINICIPAL=<name>

How Plain (Non-Pump) Distcc Works

distcc only ever runs the compiler and assembler remotely.  With plain distcc, the preprocessor must always run locally because it needs to access various header files on the local machine which may not be present, or may not be the same, on the volunteer.  The linker similarly needs to examine libraries and object files, and so must run locally.

The compiler and assembler take only a single input file (the preprocessed source) and produce a single output (the object file). distcc ships these two files across the network and can therefore run the compiler/assembler remotely.

Fortunately, for most programs running the preprocessor is relatively cheap, and the linker is called relatively infrequent, so most of the work can be distributed.

distcc examines its command line to determine which of these phases are being invoked, and whether the job can be distributed.

How Distcc-Pump Mode Works

In pump mode, distcc runs the preprocessor remotely too.  To do so, the preprocessor must have access to all the files that it would have accessed if had been running locally.  In pump mode, therefore, distcc gathers all of the recursively included headers, except the ones that are default system headers, and sends them along with the source file to the compilation server.

In distcc-pump mode, the server unpacks the set of all source files in a temporary directory, which contains a directory tree that mirrors the part of the file system that is relevant to preprocessing, including symbolic links.

The compiler is then run from the p