A single source of truth - Building a successful enterprise data catalogue

With more organisations seeking to create centralised ‘supermarket’-style catalogues of curated datasets for business users, Jason Michaelides explores how to navigate the three biggest issues you’ll face as you go on this journey.

As organisations grow, it’s common for them to deploy an increasing number of specialist applications to provide best-in-class capabilities in different areas. Of course, none of these systems operates in isolation. Information from their respective data stores is needed by other parts of the business, to support a variety of processes and goals, such as creating a single view of the customer, product or region, or to enable new customer-facing services.

To meet each of these needs, organisations typically implement point-to-point integrations between the source data system and the consuming system, designed for the specific use case. While this approach is manageable on a limited scale, as more of these inter-system links are added, it leads to a complex web of integrations that’s increasingly fragile and challenging to maintain. 

Data quality issues also increase, such as inconsistencies between data that’s meant to be the same, leading to multiple ‘versions of the truth’. Another big challenge is around the oversight, security and governance of the data, particularly in highly regulated environments.

A better approach: The enterprise data catalogue

Most organisations reach a point where they need to streamline this complex setup. One solution is to build a centralised data catalogue. This forms a middle layer between the various source systems and the applications and users consuming the data. It becomes the single source of truth for key enterprise information, where any authorised user can find and access the data they’re permitted to use.

But creating an enterprise data catalogue in a large organisation is a difficult thing to do, with technical, regulatory and cultural challenges to overcome. Below, we look at how to address the three main obstacles to success.

1 Gaining access to the source data

Traditionally, the owner of each upstream source system is the guardian of that data, responsible for ensuring it’s handled in line with company policy and regulatory requirements. Moving to a data catalogue model means handing over some of this control, which many stakeholders will understandably be cautious or even fearful of, particularly in highly regulated businesses.

The team creating the data catalogue will need to overcome this pushback, by demonstrating understanding and competence around modern data governance (more on this shortly), and by showing the source system owners how the new approach will benefit them day-to-day.

For example, most source system owners will currently liaise with multiple stakeholders, manage numerous integrations and deal with frequent new data access requests. The data catalogue eliminates this: the source system owner only needs to engage with the catalogue team on an ongoing and collaborative basis, and maintain a single integration with this centralised system. This frees up source system owners to spend more time adding business value around their application’s core purpose.

2 Curating and governing the catalogue, while enabling straightforward access and encouraging curiosity

For the data catalogue to be a useful business resource, the team in charge needs to curate the raw data into a suite of datasets that are useful, discoverable, trusted and usable. This will include thoughtfully bringing together data from multiple systems, and presenting it in a way that encourages curiosity, by enabling people from around the business to explore data and use it in their applications or workflows.

Getting this curation right is incredibly important if you’re to successfully move away from individual people or teams doing this themselves in spreadsheets. Promoting a culture of collaboration amongst data-domain experts and business SMEs will ensure the most effective curation of the catalogue, the smoothest processes for access and use, and – most importantly – help unlock the greatest value from the available data.

As important as the curation, of course, are governance and data security. The catalogue owners will need to take on significant responsibility for all data in the catalogue. They’ll need to understand what data is available in each source system, the applicable policy and regulatory frameworks, and what uses these permit. Based on this knowledge, the catalogue team must then implement and enforce the necessary restrictions, by classifying all data and configuring user permissions appropriately. 

3 Change management: Keeping pace with evolving business requirements

People’s data needs will evolve rapidly. There will be requests for new curated datasets, or variations of existing ones. New data will become available, which will need to be integrated into the data catalogue. Your catalogue team needs to be able to deliver on these emerging requirements quickly, if you’re to avoid a proliferation of spreadsheets. This demands an operating model that facilitates agile change. 

You’ll firstly need business analysis capability to assess the priority of each new request. Secondly, you’ll need close collaboration between the different source system owners, to make sure decisions are taken in the collective interest of the business, and to ensure there’s agreement around which system provides the ‘golden’ source of truth of data stored in multiple systems. 

Thirdly, the data catalogue team will need tools that enable responsive change. This will most likely include a mix of commercial, open source or bespoke products. Low- and no-code solutions are often a good choice, given they typically facilitate quicker and lower-cost changes. Fourthly, these must be supported by a fast, continuous delivery methodology. And finally, you need the skills to deliver the change requests as they come forward, either in-house or through a network of trusted partners.

A rewarding journey

Building an enterprise data catalogue is a challenging, but extremely rewarding undertaking. We hope this overview gives you a sense of the main areas to consider and the skills you’ll need as you develop and maintain yours. Get this right, and you’ll create an asset that delivers long-term benefits to people right across your business, from the source system owners and compliance officers to those responsible for your internal and customer-facing products and services.

Don't miss these stories:

Do you have a project in mind?
We combine expertise and creativity to solve highly complex data and analytics problems.