What is Data Lineage?
AgileData mission is to reduce the complexity of managing data.
In the modern data world there are many capability categories, each with their own specialised terms, technologies and three letter acronyms.
We want managing data to be simply magical, so we share articles that explain these terms as simply as we know how.
In this article we describe what is Data Lineage.
What is data lineage?
Simply put, data lineage shows how your data has traveled from start to finish.
Data goes through a journey and typically this is shown in the form of a data lineage map, allowing ‘you’ the user to see where the data came from, every process the data has been put through, right to the very end of where the data ended up.
We often see the use of maps to describe a journey in our everyday lives.
The idea of lineage has been around for centuries, used to describe a group of people who are related to each other as the direct descendants of a particular ancestor.
We combine both of these to to show where data came from and where it went.
Data lineage can be used to trace the history of data, providing valuable insights into how it was generated and where it came from. This information can be used to improve data quality and ensure that data is being used correctly.
Why is data lineage useful?
Data Lineage is useful for understanding how data is transformed, tracing errors back to their source and auditing data for compliance purposes. Data lineage can be used to improve the accuracy of data by identifying and correcting errors. It can also be used to improve the efficiency of data processing by reducing the need for manual intervention.
Increased trust in data
From a business point of view, before you make a strategic decision you want to be able to rely on the validity and accuracy of the information you have in front of you, Being able to see the provenance of data in the form of data lineage increases the trust in the data by providing the complete context of the data collection and transformation process and making it visible to everybody.
Improving data governance
Data lineage can help improve data governance by providing visibility into how data is being used and shared. This information can be used to ensure that data is being used correctly and that appropriate controls are in place.
Understanding the impact of change
Data Lineage can also be useful for impact analysis; understanding how a change in one part of the system will affect other parts of the system.
Data lineage can be used to detect errors in data processing. By tracking how data moves through the data supply chain, it is possible to identify where errors occur and take corrective action. Visible data lineage means you can identify potential anomalies as you have the ability to visually track the flow of the data from beginning to end.
Understanding the big picture
It benefits newcomers to the organisation who haven’t worked with this data before, instead of looking through all the code trying to visualise it themselves, they already have a visual representation of the data supply chain in all its complexity for them to view.
Who would use data lineage?
Data Lineage can be used by anyone who needs to understand the flow of data through the data supply chain. This includes analysts, and business users. Anyone who needs to trust the data they are working with will benefit from having a clear understanding of its where it came from, what was done to it and where it went.
Data Stewards are responsible for assisting with the governing of data within an organisation. They need to be able to track the origins of data and understand how it has been transformed over time in order to ensure its quality and accuracy.
Analysts often need to understand the origins of data in order to trust its accuracy. By understanding the lineage of data, they can trace back any errors or discrepancies to their source. Additionally, analysts may use data lineage to discover new relationships between data sets.
Data Architects design the structure and flow of data within an organisation. In order to do this effectively, they need to be able to understand the relationships between different data sets. Data Lineage can help them see these relationships and plan accordingly.
Data Developers & Engineers
Engineers often need to understand the lineage of data in order to properly transform it. For example, when developing ELT processes, Engineers need to know where the data is coming from and how it has been transformed in order to make sure they use it in the correct manner.
Podcast: Data Lineage, mapping your way to magic
Listen to an AgileData podcast where Shane and Nigel discuss Data Lineage
AgileData provides a magical data lineage capability within the AgileData product.
For us data lineage is like salt and pepper, its table stakes, its should just be there when you need it.
Keep making data simply magical
Simply Magical Data Lineage
See the flow of your data, when you need to.