For optimal reading, please switch to desktop mode.
Welcome to the first 2026 edition of our (sort of) quarterly newsletter, Navigating Upstream.
StackHPC turned 10 this January, so in this edition we will be taking a look back over the last decade at how the company came to be what it is today.
After a brilliant 2025, we’re keen to carry that momentum into the new year. The team is set to grow significantly, and we have kicked off some of our biggest projects in the last month.
Also featured are a raft of updates to our SLURM appliance, infrastructure updates and the usual CVE notices and recaps of recent events.
– The StackHPC team
Celebrating 10 years of StackHPC
This January marked 10 years since the incorporation of StackHPC as a limited company. Read on for the story of the company’s growth from its inception in a pub in Bristol, to winning the OpenInfra Superuser award.
Built on strong foundations
StackHPC can trace its roots all the way back through Bristol’s near 50-year computing history. The founding of Inmos in 1978 sparked a technological revolution in the south-west of England, drawing over £200 million in government investment and leading to significant developments in parallel supercomputing (Wired).
Meiko, founded by former Inmos employees, launched an Inmos transputer-based supercomputer named the Computing Surface in 1986, and would employ both StackHPC co-founders John Taylor and Stig Telfer in the early 1990s.
Meiko’s technical team transferred to Quadrics in the late 1990s to keep developing Meiko networking technology. A spin-off called Gnodal formed ten years later, which folded in 2013 with its assets and team, including Stig, transferred to the Cray Bristol Research Laboratory. It was here that OpenStack’s potential for HPC infrastructure was first explored, and the Slingshot network fabric was developed.
StackHPC’s HPC cloud consultancy began at Cambridge University. John and Stig attended the OpenStack summit in Tokyo, 2015, with Stig presenting: The Case for a Scientific OpenStack.
In Tokyo, John and Stig met Tim Bell from CERN, and the idea for an OpenStack Scientific Special Interest Group (SIG), a forum to share ideas across the scientific open infrastructure community, was born.
Where we are now
Since then, StackHPC has continued to grow, with projects across Europe and the UK, the United States, Asia, Oceania and Africa and now employing almost 50 people. While a lot has changed, much remains the same. We are still headquartered in Bristol, but in our own office, and we have colleagues in Poland, and France. We continue to be champions for open source principles in all our work, exemplified by our position as the second highest contributor to the latest major OpenStack releases Epoxy and Gazpacho (Stackalytics: by reviews, resolved bugs, person-day hours).
A recent UK government publication
noted StackHPC’s ‘highly innovative UK AI software stack’ in contribution to the University of Cambridge’s DAWN supercomputer, as the government is set to invest a further £36 million in the AI Research Resource (AIRR) at Cambridge for a new supercomputer project. DAWN has already played a role in supporting over 350 projects, such as accelerating research into personalised cancer vaccines. We are excited to continue our involvement in this and many other important projects throughout 2026.
The State of the SLURM Appliance
Achieving peak performance from AI infrastructure is crucial for maximising returns on significant hardware and software investments, especially given the competitive nature of modern day AI workloads.
We discussed our SLURM Appliance in our first issue of this newsletter in August 2025 and since then 130 PRs have been merged. Aside from a continual stream of updates, most of these have focused on improving stability or usability.
The appliance can now be deployed on isolated networks with no outbound internet access, where previously this had always been a requirement. This required various structural changes in the Ansible and did increase the size of the images we build (and make freely available).
However, it has also improved the reliability of all deployments (both client systems and in our CI) due to the reduced reliance on external resources. Similarly, we have continued to expand the use of our Release Train repository mirror for image builds, making these more reproducible and less reliant on other external infrastructure.
We've also made a number of improvements to the OpenTofu configurations which define the cluster infrastructure, most notably adding validation for input variables and making it simpler to add additional groups of nodes for either compute partitions or other services. Configuring remote state was something needed for most deployments, and is not easy to get right, so we've added templates and documentation for this to make best-practice simple.
As mentioned, package upgrades are a continual source of changes, the largest of which was moving our RockyLinux 9 image to 9.6, once a suitable version of NVIDIA DOCA was available. We still support RockyLinux 8 but are not planning to support RockyLinux 10 yet, as there is not yet significant demand, and to keep our CI matrix reasonable. We also upgraded Ansible (to v2.16), and the Ansible host preparation script can handle all the changes this entails.
Work on a baremetal appliance with NVIDIA DGX compute nodes has also led to support for software RAID root disks and NVIDIA's fabricmanager. We also improved our GPU support, with GRES options added to the OpenOndemand job launchers and automatic configuration of EESSI, which now supports NVIDIA GPUs. Clusters with EESSI enabled also now automatically deploy and configure a proxy, to reduce the manual configuration required.
Speaking of our Ondemand configuration, we’ve added support for DEX, which massively expands how our Ondemand deployments can authenticate users. As always, we wrap the underlying functionality to try to provide sensible, functional defaults for the appliance while allowing the required site customisations.
Lastly, the latest releases have brought fixes for two recent CVEs that affect SLURM. The first affects the authentication service for user access, MUNGE, and allows a potential attacker to impersonate any user, including root. The second concerns the potential to cause a Denial-of-Service or allow remote code execution via a vulnerability in OpenSSL. Find more details on both of these issues in CVE watch below.
Future plans include taking advantage of client experience in operating busy clusters to update our default SLURM configuration and making it easier to build and configure site-specific images.
The View from the Release Train - Infrastructure updates
A brief summary of important upgrades, features and fixes in our OpenStack infrastructure products in the last quarter.
Kayobe
- Various fixes for issues including missing NTP configuration for infrastructure VMs, and an issue where the Bifrost host variable file failed to generate when the default ipv4 gateway was not defined.
- Read the full release notes
CVE Watch
The beginning of 2026 has already delivered its fair share of security vulnerabilities potentially affecting our customers’ deployments. We won’t be making predictions about this slowing down, if anything current AI tooling has shown it’s quite apt at discovering security bugs, so we expect a steady flow of vulnerabilities this year.
CVE-2026-25506: MUNGE (SLURM) Buffer overflow in message unpacking allows key leakage and credential forgery
This vulnerability stems from an issue in MUNGE, the authentication service used to authenticate users – including system service users – within the cluster and to the SLURM daemons. MUNGE can be understood as the basis of trust in the StackHPC SLURM appliance.
This vulnerability could allow an attacker to impersonate any other user, including root , to the SLURM controller and SLURM daemon. This, in turn, would let the attacker gain access as this user on compute nodes via a job, and administer SLURM using scontrol commands etc.
The attacker only needs login access to a node of the cluster or the ability to submit a SLURM job to attack the local munge daemon, which makes it a serious flaw to patch quickly especially in busy clusters.
As packages for the fixes were not yet available from the vendors of the host OS we deploy to (Ubuntu and Rocky Linux) we went ahead and packaged the fix ourselves for our customers and have been deploying it.
This one is both quite specific in its requirements and potentially quite bad in terms of impact.
The attack exploits a bug in parsing AuthEnvelopedData structures that use Authenticated Encryption with Associated Data (AEAD) ciphers such as AES-GCM. An attacaker with supplying a crafted Cryptographic Message Syntax (CMS) message with an oversized Initialization Vector (IV) can trigger a buffer overflow, which may lead to a crash and a Denial of Service (DoS) or, more worryingly, potentially allow for remote code execution.
Again, worth patching quickly. We’ve been deploying the patched versions to our customers’ systems.
From the Blog
Slinking Time
Published 8 February 2026, by Raine Wales
Raine explores how FluxCD enables the packaging of Slinky as an Azimuth app, lowering the barrier to entry for SLURM by deploying it within a Kubernetes cluster.
Read the full post.
Azimuth All Alone: Standalone Mode
Published 17 December 2025, by Raine Wales
Raine presents a solution to deploy Azimuth on a Kubernetes cluster, allowing organisations and users without an existing OpenStack cloud to make use of the platform, and bring in new ideas.
Read the full post.
United by OpenStack: Empowering Women in STEM Beyond Borders
Published 12 November 2025, by Massimiliano Favaro-Bedford
Max summarises a great experience supporting students from the University of Jos, Nigeria, through a 7 week internship covering OpenStack and Magnum development.
Read the full post.
Keep an eye on our LinkedIn, or the blog page of our website for the latest posts.
Parting words
Thank you for taking the time to read the second edition of Navigating Upstream!
We always welcome any feedback and suggestions.
If you’d prefer not to receive future editions, you can opt out at any time using the link below or with a simple reply. Otherwise, we look forward to keeping in touch.
– The StackHPC Team
Reach out to us via Bluesky,
LinkedIn or directly via
our contact page.