How can we make a workload easier on cloud? In a previous article we presented the lay of the land for HPC workload management in an OpenStack environment. A substantial part of the work done to date focuses on automating the creation of a software-defined workload management environment - SLURM-as-a-Service. The projects that look at enriching the environment available to workload management services once they are up and running in the cloud appear to be less common.
One example that came along last week was the merge upstream of a new spec for multi-tenant log retrieval in Monasca. This proposal was made and seen through by StackHPC's Steve Simpson.
Monasca and Multi-Tenant Monitoring
Monasca monitors OpenStack, but it goes further than that.
From its inception, Monasca has been designed with the distinction of supporting multi-tenant telemetry. Any tenant host, service or workload can submit telemetry data to a Monasca API endpoint, and have it collected and salted away. Later, the user can log in to a dashboard (Grafana in many cases), and interactively explore the telemetry data that they collected about the operation of their instances.
Can your tenants do that?
The intention is that complex services like telemetry and monitoring are provided as a service, without requiring the users to create and deploy their own.
Adding Logging to the Mix
Time-series telemetry is certainly useful, but is only one part of a comprehensive solution. We also want to gather data on events that occur, and logs of activity from the services and operating systems that underpin our research computing platforms.
The Monasca project (led by the team from Fujitsu) have been working on logging support for a little while. They first presented their work at the Tokyo summit:
Logging for system and OpenStack services has been up and running in Monasca for a few releases.
What has been missing (until now) has been a way of providing multi-tenant access to log retrieval.
Reducing the Time to Science
It's clear our users have work to do, and our OpenStack projects exist to support that.
Using Monasca, we can already present log data inline with telemetry data for system administration use cases. For example, here's log and telemetry data collected from monitoring RabbitMQ services, drawn from Monasca and presented together on a Grafana dashboard:
Once the new multi-tenant logging API is implemented, we'll be providing our users with the same services for telemetry and logging of their own infrastructure, platforms and workloads.