Mellanox OFED for the Overcloud

Building Mellanox OFED for OpenStack Overcloud Images

A typical HPC system environment is continuously managed: a maintained configuration, updated over time as new software packages become available. The cloud compute model takes a different approach: infrastructure, (such as the OS and run-time environment of an HPC system) is like code, and that infrastructure gets managed through "recompilation", not through being updated in place.

This can have far-reaching consequences for the workflow with which systems are managed. One clear benefit is that any system can be rebuilt according to a formula (likely to be a collection of scripts, Ansible playbooks or Puppet manifests), and that formula can be managed and developed under source control. If a formula is sufficiently precise, it can provide an increased level of repeatability: at some future point we can check out the formula and use it again to rebuild servers to a similar configuration.

One quirky side-effect of this approach is that a cloud-model image contains no 'baggage': no old, superseded packages lying unused after an upgrade. In particular, cloud-model images do not contain the kernel package that originally shipped with that distribution, since updated several times over. This turns out to be a nuisance when installing Mellanox OFED, which assumes this original kernel is present.

When building a cloud-model system image we are typically also building something different from the environment of the build host. One is likely to be a well-stocked sysadmins toolbox. The other should be pared-down and minimally sufficient for the (virtualised) task at hand. We need to prevent the configuration of the build host from affecting the cloud image. We need to avoid this kind of pollution.

How Mellanox OFED is Built

The new kernel should be installed on the build system. Assuming a kernel RPM package is being used, both the kernel and the kernel-devel RPMs should be installed.

Mellanox OFED is downloaded as a tarball, or an equivalent ISO image. Within the image is a yum repo of RPMs and a set of scripts for automating building, installing and uninstalling.

After unpacking the tarball (or mounting the ISO), we use the build automation script, mlnx_add_kernel_support.sh:

KVER=3.10.0-327.22.2.el7.x86_64
./mlnx_add_kernel_support.sh \
     --mlnx_ofed $PWD --kernel $KVER --yes --verbose --make-iso

How to Use the Output

If the build succeeds, an updated version of the Mellanox OFED tarball (or ISO image) is generated in /tmp. This output can be used to install in exactly the same way as the Mellanox OFED build first downloaded.

Links

Social