For the first time, the November TOP500 list (published to coincide with Supercomputing 2020) includes fully OpenStack-based Software-Defined Supercomputers:
Drawing on experience including from the SKA Telescope Science Data Processor Performance Prototypting Platform and Verne Global's hpcDIRECT project, StackHPC has helped bootstrap and is providing support for these OpenStack deployments. They are deployed and operated using OpenStack Kayobe and OpenStack Kolla-Ansible.
A key part of the solution is being able to deploy an OpenHPC-2.0 Slurm cluster on server infrastructure managed by OpenStack Ironic. The Dell C6420 servers are imaged with CentOS 8, and we use our OpenHPC Ansible role to both configure the system and build images. Updated images are deployed in a non-impacting way through a custom Slurm reboot script.
With OpenStack in control, you can quickly rebalance what workloads are deployed. Users can move capacity between multiple Bare Metal, Virtual Machine and Container based workloads. In particular, OpenStack Magnum provides on demand creation of Kubernetes clusters, an approach popularised by CERN.
In addition to user workloads, the solution interacts with iDRAC and Redfish management interfaces to control server configurations, remediate faults and deliver overall system metrics. This was critical in optimising the data centre environment and resulted in the high efficiency achieved in the TOP500 list.
For more details, please watch our recent presentation from the OpenInfra Summit: