$ ch-run [OPTION...] NEWROOT CMD [ARG...]
Run command CMD in a fully unprivileged Charliecloud container using the flattened and unpacked image directory located at NEWROOT.
- -b, --bind=SRC[:DST]
Bind-mount SRC at guest DST. The default destination if not specified is to use the same path as the host; i.e., the default is --bind=SRC:SRC. Can be repeated.
If --write is given and DST does not exist, it will be created as an empty directory. However, DST must be entirely within the image itself; DST cannot enter a previous bind mount. For example, --bind /foo:/tmp/foo will fail because /tmp is shared with the host via bind-mount (unless --private-tmp is given).
Most images do have ten directories /mnt/[0-9] already available as mount points.
Symlinks in DST are followed, and absolute links can have surprising behavior. Bind-mounting happens after namespace setup but before pivoting into the container image, so absolute links use the host root. For example, suppose the image has a symlink /foo -> /mnt. Then, --bind=/bar:/foo will bind-mount on the host’s /mnt, which is inaccessible on the host because namespaces are already set up and also inaccessible in the container because of the subsequent pivot into the image. Currently, this problem is only detected when DST needs to be created: ch-run will refuse to follow absolute symlinks in this case, to avoid directory creation surprises.
- -c, --cd=DIR
Initial working directory in container.
Bind ch-ssh(1) into container at /usr/bin/ch-ssh.
don’t expand variables when using --set-env
- -g, --gid=GID
Run as group GID within container.
- -j, --join
Use the same container (namespaces) as peer ch-run invocations.
Join the namespaces of an existing process.
Number of ch-run peers (implies --join; default: see below).
Label for ch-run peer group (implies --join; default: see below).
By default, your host home directory (i.e., $HOME) is bind-mounted at guest /home/$USER. This is accomplished by mounting a new tmpfs at /home, which hides any image content under that path. If this is specified, neither of these things happens and the image’s /home is exposed unaltered.
By default, temporary /etc/passwd and /etc/group files are created according to the UID and GID maps for the container and bind-mounted into it. If this is specified, no such temporary files are created and the image’s files are exposed.
- -t, --private-tmp
By default, /tmp is shared with the host. If this is specified, a new tmpfs is mounted on the container’s /tmp instead.
- --set-env=FILE, --set-env=VAR=VALUE
set environment variable(s), either as specified in host path FILE, or set variable VAR to VALUE
- -u, --uid=UID
Run as user UID within container.
Unset environment variables whose names match GLOB.
- -v, --verbose
Be more verbose (can be repeated).
- -w, --write
Mount image read-write (by default, the image is mounted read-only).
- -?, --help
Print help and exit.
Print a short usage message and exit.
- -V, --version
Print version and exit.
Note: Because ch-run is fully unprivileged, it is not possible to change UIDs and GIDs within the container (the relevant system calls fail). In particular, setuid, setgid, and setcap executables do not work. As a precaution, ch-run calls prctl(PR_SET_NO_NEW_PRIVS, 1) to disable these executables within the container. This does not reduce functionality but is a “belt and suspenders” precaution to reduce the attack surface should bugs in these system calls or elsewhere arise.
Host Files and Directories Available in Container Via Bind Mounts
In addition to any directories specified by the user with --bind, ch-run has standard host files and directories that are bind-mounted in as well.
The following host files and directories are bind-mounted at the same location in the container. These give access to the host’s devices and various kernel facilities. (Recall that Charliecloud provides minimal isolation and containerized processes are mostly normal unprivileged processes.) They cannot be disabled and are required; i.e., they must exist both on host and within the image.
Optional; bind-mounted only if path exists on both host and within the image, without error or warning if not.
- /etc/hosts and /etc/resolv.conf. Because Charliecloud containers share the host network namespace, they need the same hostname resolution configuration.
- /etc/machine-id. Provides a unique ID for the OS installation; matching the host works for most situations. Needed to support D-Bus, some software licensing situations, and likely other use cases. See also issue #1050.
- /var/lib/hugetlbfs at guest /var/opt/cray/hugetlbfs, and /var/opt/cray/alps/spool. These support Cray MPI.
- $PREFIX/bin/ch-ssh at guest /usr/bin/ch-ssh. SSH wrapper that automatically containerizes after connecting.
Additional bind mounts done by default but can be disabled; see the options above.
- $HOME at /home/$USER (and image /home is hidden). Makes user data and init files available.
- /tmp. Provides a temporary directory that persists between container runs and is shared with non-containerized application components.
- temporary files at /etc/passwd and /etc/group. Usernames and group names need to be customized for each container run.
Multiple Processes in the Same Container with --join
By default, different ch-run invocations use different user and mount namespaces (i.e., different containers). While this has no impact on sharing most resources between invocations, there are a few important exceptions. These include:
- ptrace(2), used by debuggers and related tools. One can attach a debugger to processes in descendant namespaces, but not sibling namespaces. The practical effect of this is that (without --join), you can’t run a command with ch-run and then attach to it with a debugger also run with ch-run.
- Cross-memory attach (CMA) is used by cooperating processes to communicate by simply reading and writing one another’s memory. This is also not permitted between sibling namespaces. This affects various MPI implementations that use CMA to pass messages between ranks on the same node, because it’s faster than traditional shared memory.
--join is designed to address this by placing related ch-run commands (the “peer group”) in the same container. This is done by one of the peers creating the namespaces with unshare(2) and the others joining with setns(2).
To do so, we need to know the number of peers and a name for the group. These are specified by additional arguments that can (hopefully) be left at default values in most cases:
- --join-ct sets the number of peers. The default is the value of the first of the following environment variables that is defined: OMPI_COMM_WORLD_LOCAL_SIZE, SLURM_STEP_TASKS_PER_NODE, SLURM_CPUS_ON_NODE.
- --join-tag sets the tag that names the peer group. The default is environment variable SLURM_STEP_ID, if defined; otherwise, the PID of ch-run’s parent. Tags can be re-used for peer groups that start at different times, i.e., once all peer ch-run have replaced themselves with the user command, the tag can be re-used.
- One cannot currently add peers after the fact, for example, if one decides to start a debugger after the fact. (This is only required for code with bugs and is thus an unusual use case.)
- ch-run instances race. The winner of this race sets up the namespaces, and the other peers use the winner to find the namespaces to join. Therefore, if the user command of the winner exits, any remaining peers will not be able to join the namespaces, even if they are still active. There is currently no general way to specify which ch-run should be the winner.
- If --join-ct is too high, the winning ch-run’s user command exits before all peers join, or ch-run itself crashes, IPC resources such as semaphores and shared memory segments will be leaked. These appear as files in /dev/shm/ and can be removed with rm(1).
- Many of the arguments given to the race losers, such as the image path and --bind, will be ignored in favor of what was given to the winner.
ch-run leaves environment variables unchanged, i.e. the host environment is passed through unaltered, except:
- limited tweaks to avoid significant guest breakage;
- user-set variables via --set-env;
- user-unset variables via --unset-env; and
- set CH_RUNNING.
This section describes these features.
The default tweaks happen first, and then --set-env and --unset-env in the order specified on the command line. The latter two can be repeated arbitrarily many times, e.g. to add/remove multiple variable sets or add only some variables in a file.
By default, ch-run makes the following environment variable changes:
- $CH_RUNNING: Set to Weird Al Yankovic. While a process can figure out that it’s in an unprivileged container and what namespaces are active without this hint, the checks can be messy, and there is no way to tell that it’s a Charliecloud container specifically. This variable makes such a test simple and well-defined. (Note: This variable is unaffected by --unset-env.)
$HOME: If the path to your home directory is not /home/$USER on the host, then an inherited $HOME will be incorrect inside the guest. This confuses some software, such as Spack.
Thus, we change $HOME to /home/$USER, unless --no-home is specified, in which case it is left unchanged.
$PATH: Newer Linux distributions replace some root-level directories, such as /bin, with symlinks to their counterparts in /usr.
Some of these distributions (e.g., Fedora 24) have also dropped /bin from the default $PATH. This is a problem when the guest OS does not have a merged /usr (e.g., Debian 8 “Jessie”). Thus, we add /bin to $PATH if it’s not already present.
- The case for the /usr Merge
Setting variables with --set-env
The purpose of --set-env is to set environment variables in addition to (or instead of) those inherited from the host shell.
If the argument contains an equals character, then it is interpreted as a variable name and value; otherwise, it is a host path to a file with one variable name/value per line (guest paths can be specified by prepending the image path). Values given replace any already set (i.e., if a variable is repeated, the last value wins). Environment variables in the value are expanded unless --env-no-expand is given, though see below for syntax differences from the shell.
For example, to prepend /opt/bin to the current shell’s path (note protecting expansion of $PATH by the shell, though here the results would be equivalent if we let the shell do it):
$ ch-run --set-env='PATH=/opt/bin:$PATH' ...
To add variables set by Dockerfile ENV instructions to the current environment:
$ ch-run --set-env=$IMG/ch/environment ...
To prepend /opt/bin to the path set by the Dockerfile (here we really can’t let the shell expand $PATH):
$ ch-run --set-env=$IMG/ch/environment --set-env='PATH=/opt/bin:$PATH' ...
The syntax of the argument is a key-value pair separated by the first equals character (=, ASCII 61), with optional single straight quotes (', ASCII 39) around the value, though be aware that quotes are also interpreted by the shell. Newlines (ASCII 10) are not permitted in either key or value. The value may be empty, but not the key.
Environment variables in the value are expanded unless --env-no-expand is given. In this case, the value is a sequence of possibly-empty items separated by colon (:, ASCII 58). If an item begins with dollar sign ($, ASCII 36), then the rest of the item the name of an environment variable. If this variable is set to a non-empty value, that value is substituted for the item; otherwise (i.e., the variable is unset or the empty string), the item is deleted, including a delimiter colon. The purpose of omitting empty expansions is to avoid surprising behavior such as an empty element in $PATH meaning the current directory. If no expansions happen, this paragraph is a no-op.
If a file is given instead, it is a sequence of such arguments, one per line. Empty lines are ignored. No comments are interpreted. (This syntax is designed to accept the output of printenv and be easily produced by other simple mechanisms.)
Examples of valid arguments, assuming that environment variable $BAR is set to bar and $UNSET is unset (or set to the empty string):
|FLAGS=-march=foo -mtune=bar||FLAGS||-march=foo -mtune=bar|
|FLAGS='-march=foo -mtune=bar'||FLAGS||-march=foo -mtune=bar|
|FOO=||FOO||empty string (not unset)|
|FOO=$UNSET||FOO||empty string (not unset or $UNSET)|
|FOO=baz:$UNSET:qux||FOO||baz:qux (not baz::qux)|
|FOO=''||FOO||empty string (not unset)|
|FOO=''''||FOO||'' (two single quotes)|
Example invalid lines:
|FOO bar||no separator|
|=bar||key cannot be empty|
Example valid lines that are probably not what you want:
|FOO="bar"||FOO||"bar"||double quotes aren’t stripped|
|FOO=bar # baz||FOO||bar # baz||comments not supported|
|FOO=bar\tbaz||FOO||bar\tbaz||backslashes are not special|
|FOO=bar||FOO||bar||leading space in key|
|FOO= bar||FOO||bar||leading space in value|
|$FOO=bar||$FOO||bar||variables not expanded in key|
|FOO=$BAR baz:qux||FOO||qux||variable BAR baz not set|
Removing variables with --unset-env
The purpose of --unset-env=GLOB is to remove unwanted environment variables. The argument GLOB is a glob pattern (dialect fnmatch(3) with no flags); all variables with matching names are removed from the environment.
Because the shell also interprets glob patterns, if any wildcard characters are in GLOB, it is important to put it in single quotes to avoid surprises.
GLOB must be a non-empty string.
Example 1: Remove the single environment variable FOO:
$ export FOO=bar $ env | fgrep FOO FOO=bar $ ch-run --unset-env=FOO $CH_TEST_IMGDIR/chtest -- env | fgrep FOO $
Example 2: Hide from a container the fact that it’s running in a Slurm allocation, by removing all variables beginning with SLURM. You might want to do this to test an MPI program with one rank and no launcher:
$ salloc -N1 $ env | egrep '^SLURM' | wc 44 44 1092 $ ch-run $CH_TEST_IMGDIR/mpihello-openmpi -- /hello/hello [... long error message ...] $ ch-run --unset-env='SLURM*' $CH_TEST_IMGDIR/mpihello-openmpi -- /hello/hello 0: MPI version: Open MPI v3.1.3, package: Open MPI root@c897a83f6f92 Distribution, ident: 3.1.3, repo rev: v3.1.3, Oct 29, 2018 0: init ok cn001.localdomain, 1 ranks, userns 4026532530 0: send/receive ok 0: finalize ok
Example 3: Clear the environment completely (remove all variables):
$ ch-run --unset-env='*' $CH_TEST_IMGDIR/chtest -- env $
Note that some programs, such as shells, set some environment variables even if started with no init files:
$ ch-run --unset-env='*' $CH_TEST_IMGDIR/debian9 -- bash --noprofile --norc -c env SHLVL=1 PWD=/ _=/usr/bin/env $
Run the command echo hello inside a Charliecloud container using the unpacked image at /data/foo:
$ ch-run /data/foo -- echo hello hello
Run an MPI job that can use CMA to communicate:
$ srun ch-run --join /data/foo -- bar
If Charliecloud was obtained from your Linux distribution, use your distribution’s bug reporting procedures.
Otherwise, report bugs to: <https://github.com/hpc/charliecloud/issues>
Full documentation at: <https://hpc.github.io/charliecloud>
2014–2021, Triad National Security, LLC