From ponds to lake: Using generic automated data integration to create a single source of truth for company data

10:40—11:00

Alan Turing Stage

Business Processes

When working with data in an enterprise context, we are often confronted with heterogeneity: People from multiple countries with different cultural backgrounds and diverse educations use data from a variety of sources with the help of all kinds of different tools and processes. The data they are dealing with can be described in several ways, not least depending on which kind of purpose it is used for. This leads to certain challenges: How can we break up data silos within the company? How can we make existing data treasures available to people who would love to work with this data? With each business unit having their own naming standards and hierarchies of assets, how can we integrate their data into one consolidated source of truth? In our joint journey to build a central data platform for the whole company, our mixed team with members from Wintershall Dea and inovex has encountered all of the challenges described above. In our presentation, we show you how we found a way to build bridges between domains and thus make data from heterogeneous sources accessible and usable in the whole enterprise. Our automated process does not require any business knowledge from data engineers but allows them to focus on data ingestion. The engineers build pipelines and take care of all technical aspects. Meanwhile, the generic integration layer that we created enables data managers themselves to incorporate their specific naming standards and hierarchies of assets into a global hierarchy. Thus consolidated, the data present in the data lake can then be used in a variety of contexts. We take you with us on our data lake’s journey from a simple data collection to the single source of truth, highlighting challenges, solutions that have been implemented as well as taking a peek at what may lie ahead in the future.

Share