mass-prebuild - Man Page

the mass-prebuild set of tools

Description

The mass pre-builder (mpb(1)) is a set of tools aimed to help the user to create mass rebuilds around a limited set of packages, in order to assess the stability of a given update.

The idea is rather simple. Given a package or a set of packages, namely "main packages", the mass pre-builder will calculate the list of its direct reverse dependencies: packages that explicitly mark one of the main packages in their "BuildRequires" field. The tooling first builds the main packages using the distribution’s facilities, which should include a set of test cases that validate general functionalities. Assuming these packages are built successfully, they are then used as base packages in order to build the reverse dependencies and execute their own test cases.

That gives a first set of results, there may be successful builds (hopefully the majority), but also failures that may or may not be due to the changes introduced by modifications of the main packages. In order to reduce the uncertainty, and give a limited list of packages to analyze, as soon as a failure is detected, the mass pre-builder will create another mass build, in parallel to the original one, but without the changes that were introduced into the main packages: a pristine build. This pristine build will therefore only include a sublist of the reverse dependencies, the ones that failed on the original run.

Once all the package builds are done, there will therefore be 3 major categories:

The successful ones
The ones that failed only with the modified packages
The ones that failed with both the modifications and the pristine version

Out of these, the first category can likely be ignored, since the packages don’t seem to have been affected by the changes.

The second category needs much more attention. These are the ones that cry: "Hey there seems to be a big issue with your changes !". Failure needs to be analyzed, in order to figure out if the problem being raised is due to changes that have been introduced, or maybe a mistake from the final user (e.g. Use of a deprecated feature that got removed).

The last category is a bit trickier. Since the build failed with the pristine packages, there may be hidden failures among them that originated from the new changes.

Examples of workflows

The package pre-release use case (medium sized)

This is the primary motivation for mpb and may therefore give some insights on decisions taken for default behavior.

This use case is basically the one for autoconf: an official release will come out soon, and considering the impact of this component, it makes sense to pre-test it and provide feedback to the community. The idea is, autoconf 2.70 led to failures that were not spotted during tests executed by the community. This resulted in an autoconf 2.71 release soon after 2.70, to fix the major ones. The Mass pre-builder was created to check for autoconf 2.72 candidates regularly, and avoid the same kind of problems.

In this example, an autoconf pre-release tarball is created, which has a specific name generated through git describe. This name is going to be re-used as the project name for MPB. If failures are detected there may be more tarballs to be created while using git bisect.

Each SRPM created that way gets its own folder, the data collected by MPB is to be centralized.

The dependencies need to be automatically calculated, they should be built to their last known version. Whenever possible the build should be ordered so that if C depends on B, while both depend on Autoconf, then C should wait for B to be built before starting. This should help detect C build failures due to a B misconfiguration (which itself could be due to the new Autoconf version).

By experience, I know that there are transient failures during build in COPR, so builds should be retried once or twice.

In principle, we should run the build for all architectures, but x86_64 is enough to have a good idea on the status. The latest version of each package needs to be built, so we go for rawhide.

Thus, the configuration looks as follows:

name: autoconf-2.72a.17.0f330
package:
  autoconf:
    src_type: file
    src: /home/anexample/work/fedora/autoconf/autoconf-2.72a.17.0f330-1.fc38.src.rpm
retry: 2
data: /home/anexample/work/mpb/autoconf-2.72

The data collected by MPB will be stored under /home/anexample/work/mpb/autoconf-2.72/autoconf-2.72a.17.0f330/. If failures are detected, assuming the problem comes from autoconf, a new build will be generated, with a dedicated name. This new build may either be a full build if there is confidence on the quality of the fix, or a limited one, using a configuration file generated by mpb-failedconf

Note

Here it shouldn’t be forgotten that the whole idea is to validate the main package. Any new version should therefore go through a full rebuild at one point.

If a failure is due to a bug in a reverse dependency, it falls into 2 categories:

There are a handful of failures, they can be restarted manually while modifying the commit ID to be used for the build.
The amount of failure is too big to restart manually, mpb-failedconf is used to generate a base config which only contains the failures from the previous build.

In the second case, the generated configuration may need to be modified, e.g. by replacing all the committish: * lines with committish: "@last_build", to ensure that newer version of the packages will be built.

The package release use case (medium size, complex dependencies)

This use case is similar to the previous one, which means that in principle, there may not be any differences. Yet, let’s modify slightly the scenario.

As packager, a fork of the original distgit is used to prepare the new release. If downstream patch need to be made, they will be added into distgit and applied on the fly.

Since the dependency graph is complex, MPB may not be able to calculate it, and brute force is to be used for the builds. The project naming is irrelevant.

package:
  ruby:
    src_type: git
    src: https://src.fedoraproject.org/fork/anexample/rpms/ruby
retry: dynamic
data: /home/anexample/work/mpb/ruby-3.5

Since no name for the project is provided, MPB will automatically attribute one. By default, the naming will be mpb.N where N is the build ID.

The data collected will therefore be stored under /home/anexample/work/mpb/ruby-3.5/mpb.5/ (assuming build ID 5).

Note the package fields. The default value for src_type is distgit, which refers to the official package repository for a given distribution. When using non-official distribution git repositories (like in this case, a fork), it is recommended to define this value to git. The default value for committish is the branch corresponding to the fedora release being build for, in our case rawhide. For each new MPB build, the latest version of our fork of the ruby package will therefore be used.

The retry field is set to dynamic: brute force is applied on the reverse dependency builds. A simplified explanation of this is that as long as there are successful builds, the failed packages will be rebuilt (that isn’t exactly true, but the idea is there).

The rest of the workflow is similar to the previous one (playing with mpb-failedconf whenever necessary).

The multi-package use case

It may happen that a package owner is responsible for multiple packages that depend on each other and therefore need to be built in a specific order.

This can be achieved through the following configuration snippet:

packages:
  componentA:
    build_order: 0
  componentB:
    build_order: 1
  componentC:
    build_order: 2

That way, MPB will make sure that componentA is built first, followed by componentB and then componentC.

The package pre-release (big size, overly complex dependencies)

This use case is typically the one that could be expected from a compiler, where thousands of packages will need to be rebuilt.

arch: all
packages:
  gcc:
    deps_only: True
automatic_build_ordering: False
data: /home/anexample/work/mpb/gcc-14
copr:
  additional_repos:
  - https://anexample.fedorapeople.org/fedora-gcc14-${arch}/

Post-processing reverse dependencies

For complex scenarios, MPB allows programmatic modification of the reverse dependency list before the build starts. This is achieved through the revdeps:postprocess configuration key, which specifies a path to an executable script.

This feature is powerful but comes with significant security implications.

Warning: Security Warning

The revdeps:postprocess feature executes arbitrary code from the user-provided script. The script runs with the same permissions as the mpb command itself.

Trust: Only use scripts that you have written or that come from a highly trusted source.
Code Review: Always review and fully understand any script before configuring it.
Principle of Least Privilege: Run mpb with the lowest possible privileges (never as root).
Environment Access: The script has access to the same environment as mpb, which may contain sensitive information.

The script receives a single command-line argument: the path to a temporary YAML file containing the reverse dependency dictionary. It must print the modified dictionary in YAML format to standard output.

Here is an example postprocess.py script:

#!/usr/bin/env python3
import sys
import yaml

if len(sys.argv) < 2:
    sys.exit("Input file path not provided.")

input_file = sys.argv[1]

with open(input_file, 'r', encoding='utf-8') as f:
    rev_deps = yaml.safe_load(f)

# --- User modifications here ---

# Example 1: Force a package to be built first
if 'important-package' in rev_deps:
    for pkg in rev_deps.values():
        pkg.setdefault('priority', 0)
        pkg['priority'] += 1
    rev_deps['important-package']['priority'] = 0

# Example 2: Remove a package from the list
rev_deps.pop('unstable-package', None)

# --- End of modifications ---

print(yaml.dump(rev_deps))

This script can be enabled in the configuration file like this:

revdeps:
  postprocess: /path/to/your/postprocess.py

Let’s go through all these new options.

The arch field is set to all, which is an MPB special option that selects all the architectures supported by COPR for rawhide.

The deps_only tells MPB that it isn’t necessary to rebuild gcc, and assume the build is a success. The reason for that is that it takes about 30h in COPR, and other means where used for this package, like Koji or a dedicated build machine. The resulting packages were put in a custom repository, which is given to COPR through the additional_repos field.

Since there are about 9k components that depend on gcc, it makes no sense to try to calculate the priorities for the builds, they will all be executed in the same COPR batch (with lower priority). For the same reason, it may make sense to have a smoke-test for this project, where a limited set of packages will be built:

arch: all
name: gcc-14_smoketest
packages:
  gcc:
    deps_only: True
revdeps:
  automake:
    committish: "@last_build"
  libtool:
    committish: "@last_build"
data: /home/anexample/work/mpb/gcc-14
copr:
  additional_repos:
  - https://anexample.fedorapeople.org/fedora-gcc14-${arch}/