Join the waitlist

Please enter your email address to stay informed about OpsMill developments. Your email address will be stored according to GDPR and will never be sold.

Thank you! Check your email for confirmation.

Introducing Infrahub -- a New Infrastructure Source of Truth with Version Control

May 28, 2024
5 min read
Damien Garros, co-founder and CEO

I am excited to announce that OpsMill has released the open beta of Infrahub, the next-generation source of truth for networks and infrastructure, which you can access in Github. In this blog, I’ll share with you our view on Source of Truth, why we designed Infrahub the way we did, and some of what it can do for infrastructure automation teams.

What is a Source of Truth and why should you care ?

Without clean and accessible data, building a powerful automation platform is very difficult. In infrastructure automation, the Source of Truth (SoT) is where we organize and store the intended state of the infrastructure. The SoT is the central component of an automation platform because all other components integrate with and depend on it.

I spent several years working on automating networks and infrastructure across multiple companies, and talking with many others who have done the same. Based on that collective experience, I strongly believe that the ideal Source of Truth must have two main capabilities:

  1. Flexible Storage Engine: It must adjust to the ever-evolving needs of the business and store all information representing the expected state of the infrastructure. This includes technical information as well as service, design, and business-level information.
  2. Change Control & Peer Review: Change management is essential for infrastructure management. For a good part of the industry, the standard for change control & peer review is Git, more specifically GitHub or GitLab where Git is combined with a Pull Request and a CI Pipeline.

Existing solutions don’t deliver a mature SoT

No solution on the market is properly addressing both the above requirements today and, as a result, every organization that has successfully deployed automation has completely or partially redeveloped their data management platform on their own at a significant cost and effort.

We have seen three typical paths that organizations have taken to implement a Source of Truth, each with benefits and drawbacks:

  • GitOps / Infrastructure as Code : Everything is built on and around Git, for better or worse. These solutions are very popular because they provide maximum flexibility with version control but at the cost of having a complex system to manage. As Git requires everything to be stored in a text file, this approach lacks important capabilities needed by infrastructure teams to be successful with automation. For example, a structured data model that provides schema-driven data integrity,  APIs, a query engine, and a user interface for those who are less familiar with coding.
  • In-house database / application : Building everything in house ensures that everything will be in line with your environment. But it comes with the high cost of building and maintaining the software yourself. Building version control into these systems remains a challenge as most databases and frameworks do not support that natively. The challenge in these home-grown systems is that they are often designed to work for well-known change processes, but don’t cope well with ad-hoc changes that are harder to pre-define in your software.
  • Purpose Built Point Tools : It’s possible to get a relatively fast start by using pre-built point tools. However,  that initial velocity fades. The lack of extensibility of these solutions creates an immense drag on the productivity of the automation system, because an increasing amount of effort must be poured into building and managing around these limitations.. One prime example of this, is that most available point tools don’t natively support version control. 

Ultimately, the problem that arises with all these approaches is that it becomes nearly impossible to sustain and scale automation due to the complexity and fragility of the overall system. 

Infrahub evolves the Source of Truth and goes beyond it

With Infrahub, we’re proud to offer a major evolutionary step forward in infrastructure automation that includes the next generation of Source of Truth, but also goes far beyond being a SoT to address the above challenges in a new way. Following is an overview of the key components and capabilities of Infrahub.

A Unified Storage Engine with Versioning

At its core Infrahub is built on a unified data storage with version control built in, for data AND files. 

We developed a new storage engine based on a multi-temporal graph that supports the main features of version control : branch, diff & merge … on top of a database. This new engine provides the best of both worlds, including a flexible schema and a strong query engine. Our storage engine is also able to store all files that are required in the management of your infrastructure: templates, code, scripts, and playbooks. 

This storage engine has been built from the ground up with infrastructure management in mind and it integrates multiple features that aren’t usually available in a database, such as hierarchical objects, profiles, and metadata.

One of the things that makes Infrahub so powerful is its schema abstraction. The schema is defined in the user-space and not fixed deep into the core of the application. This provides complete flexibility for users of Infrahub  to easily change and adapt the schema to their needs, without restarting or rebuilding anything. The schema itself is integrated with version control and it’s possible to have a different schema per branch. This level of schema abstraction offers the perfect mix between flexibility and integrity. 

Abstracted, persistent, immutable, and testable artifact rendering

In order to decouple the Source of Truth from the rest of the automation stack, and make the overall system sustainable and manageable, Infrahub features a “transformation” engine to expose infrastructure data as rendered artifacts in any format. 

The most common use case is config generation. However, the framework is designed to support any type of data and any type of artifacts. 

With Infrahub you can expose any data in any format. Rendered data is stored persistently and immutably, is API-accessible, and provides a unit-testable scope to validate that changes to the schema or instantiated data will result in properly usable artifacts.

By providing this abstraction layer, Infrahub makes the coupling with other components easier to change and validate, which makes the entire automation stack easier to maintain.

Native peer review & CI pipeline

Rather than require infrastructure teams to build a CI pipeline around our platform, we built one into our platform 🎉.

One of the core features of Infrahub is the Proposed Change, which allows multiple people to collaborate on a set of changes and validate those changes with built-in and user-provided CI validation logic. Our goal is to provide a user-friendly way for network and infrastructure teams to review and validate changes using a deep integration with the content of the database. The end result is a tight integration between the CI pipeline and the changes in a branch that optimizes the pipeline such that it will only execute the validation required by the change.

Flexible ways to interface with the data

Last but not least, Infrahub provides multiple ways to interact with the data, which is important to make an automation platform as consumable as possible for the whole team. Out of the box, Infrahub offers a user interface with Role-Based Access Controls (RBAC), as well as a complete GraphQL-based API, both of which automatically adjust to the schema as provided by the user, and as it changes over time.

Integrating with automation ecosystems

We're excited to see how Infrahub will integrate into known automation ecosystems and we’ve already started multiple projects to help with these integrations:

  • The Python SDK for Infrahub provides an easy interface to interact with Infrahub from Python. It’s designed to also integrate deeply with the schema and to simplify the interaction with GraphQL.
  • Both the Ansible Collection for Infrahub and the Nornir plugin include a dynamic inventory to easily leverage the devices you have in Infrahub as your inventory. Both Ansible and Nornir provide you with easy access to rendered artifacts that you can leverage in your automation workflows.
  • The Infrahub-Sync library helps move data between systems and integrates with Netbox and Nautobot

We’re just getting started

Opening up the beta program to everyone is a big milestone for us and for Infrahub but we know there is still a lot of work in front of us. 

Infrahub has been deeply shaped by the community thus far, and we look forward to continuing work with the community to shape the future of infrastructure management and to build the best possible Infrastructure automation platform together. Join our Discord to connect with the Infrahub community. Our team is always available to chat and answer questions.

Damien Garros, co-founder and CEO