We settled on a half-day session in the cross-project phase of the Project Teams Gathering (PTG). This turned out to be a great choice: our session, formed from a development-centric core of regional SIG members, was greatly enhanced with a number of leaders from the Nova and Ironic projects, who contributed hugely in advancing the discussion.
The proceedings of the discussion were tracked in an Etherpad.
Here are some highlights from my own point of view.
The concept of the service is to introduce a second class of instances, managed outside of a user's standard quota, which can take opportunistic advantage of temporary resource availability. The flipside is that when the cloud infrastructure becomes fully-utilised these instances will be terminated at short notice in order to service prioritised requests for resources.
The Reaper process is designed to intercept a NoValidHostFound event in order to search for a preemtible victim, shut it down, harvest the resources and retry the original request (which should hopefully now succeed).
CERN's approach is to only trigger the Reaper when the cloud is totally full. Some use cases may favour reaping when utliisation exceeds a given threshold. For example:
- In a cloud with bare metal compute instances, reaping preemptible bare metal resources could take significantly longer, particularly if Ironic node cleaning is enabled.
- In a cloud with high turnover of instances, the additional time required to perform the reaping action could result in many requests contending for reaped resources. This could add delay to instance creation, and the race for claiming reaped resources is not guaranteed to be fair.
The SKA project has taken a strong interest in this work, which is seen as key to raising utilisation on a finite cloud resource. John Garbutt from StackHPC has been providing technical input and upstream assistance on the SKA project's behalf. The long-term hope is to introduce the minimal extensions to Nova required to enable services like the Reaper to function, and deliver preemptible instances as effectively as possible.
The discussion and design work continues - here is the spec.
Nested Projects and Quotas
At the inaugural session for the Scientific Working Group at the 2016 OpenStack summit in Austin, one pain point articulated by Tim Bell of CERN was OpenStack's issues with managing quotas across hierarchical projects. He provided a detailed use case in a subsequent blog post. Not much progress has been made since then - until now.
John Garbutt updated the SIG on a cross project PTG session on identity - this discussion had been around refactoring quota management to be handled by Keystone and a new Oslo library. The advantage of this is that it brings quota management to the service that understands the hierarchy of nested projects. Managing quota for users in nested projects has long been a pain point for large-scale users of OpenStack in research computing.
Cinder Volume Multi-Attach
OpenStack Queens introduces multi-attach Cinder volumes, potentially enabling many nodes to share a common volume. This introduces some potential for new ways of scaling research computing infrastructure:
- Enabling a common cluster of nodes to boot from the same read-only volume, leading to minimal infrastructure state per instance and strong immutability.
- Enabling the same capability for large-scale bare metal infrastructure.
Nova's documentation already includes details on using multi-attach.
The initial implementation of Cinder volume multi-attach is a huge step forward but unlikely to meet all our requirements. We'll test this capability out in due course and see how close we get.
Ironic Advanced Deployments
Ironic's new flexibility with regard to deployment has introduced new and compelling possibilities for the ways in which it can be used for infrastructure management.
To ensure the advanced deployment requirements articulated in the meeting have an enduring impact, SIG members were asked to capture use cases in RFEs for Ironic in Storyboard. This has already started happening:
- Boot to RAMdisk: booting Ironic compute nodes directly into a RAMdisk image could create a very rapid and scaleable deployment-free infrastructure provisioning process. This can only realistically support small software images and special purpose deployment, but in general could unlock the potential for using Ironic for extreme-scale HPC environments.
- Deploy with kexec: Some SIG members find the extra reboot in Ironic's deployment cycle unbearably tiresome. That's not because they have a chronically short attention span - rather that there are systems out there with enough RAM and other devices to initialise that a power cycle can take of the order of hours instead of minutes. The possibility of using kexec to avoid a power cycle makes a huge usability difference in these circumstances.
I had the opportunity to talk about the Scientific SIG and the aims of the group in an interview for SuperUser.