How can we make a workload easier on cloud? In a previous article
we presented the lay of the land for HPC
workload management in an OpenStack environment. A substantial
part of the work done to date focuses on automating the creation
of a software-defined workload management environment -
SLURM-as-a-Service. The projects that look at enriching the
environment available to workload management services once they are
up and running in the cloud appear to be less common.
One example that came along last week was the merge upstream of a
new spec for multi-tenant log retrieval in Monasca. This proposal was made
and seen through by StackHPC's Steve Simpson.
Monasca and Multi-Tenant Monitoring
Monasca monitors OpenStack, but it goes further than that.
From its inception, Monasca has been designed with the distinction
of supporting multi-tenant telemetry. Any tenant host, service or
workload can submit telemetry data to a Monasca API endpoint, and
have it collected and salted away. Later, the user can log in to
a dashboard (Grafana in many cases), and interactively explore the
telemetry data that they collected about the operation of their
Can your tenants do that?
The intention is that complex services like telemetry and monitoring
are provided as a service, without requiring the users to create and
deploy their own.
Adding Logging to the Mix
Time-series telemetry is certainly useful, but is only one part of a
comprehensive solution. We also want to gather data on events that
occur, and logs of activity from the services and operating systems
that underpin our research computing platforms.
The Monasca project (led by the team from Fujitsu) have been working
on logging support for a little while. They first presented their
work at the Tokyo summit:
Logging for system and OpenStack services has been up and running
in Monasca for a few releases.
What has been missing (until now) has been a way of providing multi-tenant
access to log retrieval.
Reducing the Time to Science
It's clear our users have work to do, and our OpenStack projects
exist to support that.
Using Monasca, we can already present log data inline with telemetry
data for system administration use cases. For example, here's log
and telemetry data collected from monitoring RabbitMQ services,
drawn from Monasca and presented together on a Grafana dashboard:
Once the new multi-tenant logging API is implemented, we'll be providing our
users with the same services for telemetry and logging of their own infrastructure,
platforms and workloads.