What is a Data Catalog?
AgileData.io mission is to reduce the complexity of managing data.
In the modern data world there are many capability categories, each with their own specialised terms, technologies and three letter acronyms.
We want managing data to be simply magical, so we share articles that explain these terms as simply as we know how.
In this article we describe what is a Data Catalog.
What’s is a data catalog?
Data Catalog’s are an organised inventory of data assets or information products within your organisation. They provide information about what data you store, the source of the data, what that data means, what it can be used for, what it is being used for, which information products it has participated in, and who has accessed it.
In essence it makes your data easier to find, easier to understand and easier to use.
The most common analogy of this is to think of a data catalog like a local library but on a worldwide scale. Through one catalog you have access to the equivalent of the whole world’s library.
What are the common features of a Data Catalog and what problems does it solve?
Browse or Search
One common feature of a Data Catalog is that it provides the ability to browse or search for data. Going back to the library analogy, like you are able to search for a specific book or range of books in a library, you are able to do this with a data catalog. It allows for you, the user, to easily search and filter through various options such as name, description, source or date modified, to be able to find the relevant data as quickly as possible. Thus saving you time and struggle of either having to request data or having to discover it yourself within your data lake, data warehouse, reporting platform etc.
One area that a data catalog can help with from a data workers perspective is troubleshooting, it allows them to locate any existing dirty data that exists and ensure it gets rectified. For a data analyst or data consumer’s, a data catalog can instill a sense of trust in the data you are using, as you are able to understand its source, its context and what has been done to it. You are able to see where it came from, where it went and what happened in between. You are also able to provide feedback when you think there is an issue with the data.
Data classification and access
The next thing to consider is compliance and security, the ability to view and understand data within the data catalog aids with staying within data governance rules and making it easily auditable. With a data catalog you should easily be able to find any data within your organisation. Often it is possible to control who can access what within the catalog, adding a layer of security. Other features such as retention dates can be applied for sensitive data.
A good data catalog will include a business glossary that is an exhaustive list of terms used within the company from top to bottom, and this is used to provide clarity between departments.
The Data Catalog can automate some of the manual tasks to collect, classify and catalog data. It can harvest the technical information about data from the systems that the data naturally resides in. It can classify data based on the type of data it holds, for example automagically identify organisation names, peoples names or credit card numbers. It can also recommend data you might be interested in, based on the data that was used by people in your organisation who are in a similar role to you.
Keep making data simply magical
AgileData.io provides a magical data catalog capability within the AgileData product.
Its like a bottle of fine wine, it gets better with everyday.
Simply Magical Data Catalog
Find the data you need, when you need it.