Heads Up: Ansible Galaxy Breaks the World

One of the great advantages arising from our technology choices has been that through standardising on Ansible we have been able to use a single, simple tool to drive everything we do.

Ansible is not really a programming language, and modularity cannot be ensured without some amount of programmer discipline. One great tool in providing a level of modularity and component reuse has been Ansible Galaxy. Our OpenStack deployment toolbag has been steadily growing and we've been thrilled to see others make use of our components as well. Share and enjoy!

Unfortunately, we are writing this post because of an event today which apparently without notice broke all our builds, and also all the work of our clients who use our technology.

It's Working Great, What Could Possibly Go Wrong?

We started to notice oddities when updating some of our roles on Galaxy earlier today. The first thing was that the implicit naming convention used for git repos such as our new BeeGFS role was no longer being honoured, so that the role name on Galaxy changed from beegfs to ansible-role-beegfs. As a result, the role could no longer be found by playbooks that required it.

This we fixed through adding a metadata tag role_name which explicitly sets the name. We did this to each of our 32 roles. Our repos are long established, many are cloned, some are forked. We can't simply rename them on a whim.

On pushing the change that sets this metadata tag, every one of our roles with a hyphenated name was silently converted to using underscores instead. This may seem innocuous, but the consequence is that, again, every playbook that referenced these roles - which is every playbook we write - could no longer retrieve the roles it required from Ansible Galaxy.

The root cause appears to be the combined effect of two changes. Ansible has removed the implicit naming convention for the git repos that back Galaxy roles. Around the same time they have introduced a newer, stricter naming convention for Galaxy roles that prevents names containing hyphens. The backwards-compatibility plans for these two changes are mutually exclusive. Unfortunately most of our roles fall into both categories.

We are not out of the woods as it appears the role_name tag that we now require to explicitly set the correct name for our roles may also be about to be deprecated. This may leave us needing to rename all the git repos for our roles.

What about Kayobe?

OpenStack Kayobe is a project that makes extensive use of Galaxy for reuse and modularity. At the time of writing Kayobe's CI is also broken by this change, and an extensive search-and-replace patchset is required, pending the outcome of our requests for upstream resolution.

What Do Our Clients Need to Do?

In summary, there seem to be a number of tedious but simple changes that must be applied everywhere:

  • All our roles now have underscores instead of hyphens in them from now on. This appears to be an inevitable change to accommodate forwards compatibility for future versions of Galaxy. We'd like to see a server-side fix to Galaxy to enable recognition of either hyphens or underscores, thus enabling a smooth transition.
  • The requirements and role invocations of every playbook that references them will need to be updated to change occurrences of - with _. We will commit those changes to our repos, but all clients will need to pull in the new changes. This should happen automatically when repos are cloned.
  • We might not be done with these build-breaking changes yet, although hopefully there will be a way forward that doesn't break things for users.

Let's hope this kind of event doesn't happen too often in future...

Mushroom cloud!