AgileData Feature #03 – Catalog Browse and Search
TD:LR
What
Data Persona’s can use the Catalog to quickly search, find and access the Data Assets (Tiles) they need to complete a data task, such as supporting the development of Information Products or using the data for analysis. Topic Tags and Dynamic Filtering allows Data Assets to be easily found using the Organisations business terminology. With the Menu Anywhere capability they can drill out to any other screen in the AgileData App, retaining the context of the Tile, removing the need to go to the screen and manually searching for that Data Asset again. Any time a data user performs a data task in the AgileData App, or an automated data task is executed on the AgileData Platform, the Catalog is automagically updated. The Catalog data is treated as a first class citizen, not as data exhaust. The Catalog is part of a wider ensemble of features that let them search or browse to locate essential Data Assets quickly, view comprehensive details for each set of data, and track the flow of data similarly to parcel tracking.Feature Requirement
The system must include a comprehensive data catalog that allows users to organise, search, and manage metadata for all data assets within the platform.Requirement Rationale
A data catalog enhances data discoverability, improves governance by providing clear documentation of data assets, and supports efficient data management by enabling users to easily locate and understand the available datasets.How
Why
This is the one feature I use the most, I use it everyday when I am doing data work, because it is typically where I start from to do that data work.
It’s also one of the first features we built when we started developing the AgileData App.
We designed the Catalog based on the patterns for browsing and searching within Netflix. We treat a table or view of data as if it was a movie, we present each as a Tile.
We had three choices when we designed the Catalog feature:
- Use the typical data Catalog pattern that is adopted by most data catalog tools, a row per Data Asset and lots of detail about that asset in each row, what I think of as a library pattern. For me this was always orientated at hard core data people.
- Use a Google search pattern, a blank screen with search, and then present a row per data Asset with a subset of detail as a result of that search.
- Netflix browse or search pattern, which is what we went with.
One of the problems with the search only patterns for Data Catalogs is it assumes you know what you are looking for.
We had a hypothesis that users would want to browse the Data Assets that were available.
I am a great fan of looking at consumer products and patterns and seeing if applying them to the data domain reduces complexity (or increases it). Netflix seemed to have the best pattern for browsing large volumes of movies to find what might interest you, so we implemented this as the primary UX in the Catalog to browse across large volumes of data sets.
This browse hypothesis was proven wrong, not because Information Consumers don’t want to browse, but because they just want to access the answer. They want to find and access the Information Product that quickly answers their Business Question and lets them get on with the Action they need to take once armed with that answer. They don’t want to browse for sets of data that might or might not answer their Business Question and then have to do a whole bunch of data work themselves to get that answer.
We added the Marketplace feature to solve this problem, the Marketplace feature reuses all the browse and search capabilities of the Catalog, another great example of our Define Once, Reuse Often (DORO) principle.
After using the Catalog for quite a while, we found immense value in the browse pattern ourselves. One of the interesting things about being a Fractional Data Team is you switch Customer context on a regular basis. The ability to be able to land into the Catalog for a Customers Tenancy and quickly browse the data Tiles that are available, helps with reorienting oneself quickly.
If I know what I am looking for I often find browsing is still the quickest way to get to that data.
But not always.
So thats why we added search.
We started off with the typical search capability, text box at the top of the Catalog screen, type some text, hit enter and wait till the search results, in our case the Tiles, were returned.
It worked but it didn’t seem very magical.
One of the useful micro interaction patterns common in web based applications these days is the idea of type-ahead. You start typing text and the machine starts to makes suggestions for completing those words for you.
We applied an iteration of this pattern to provide Dynamic Filtering in the AgileData App. As you start typing in the search box we are automagically filtering the Tiles in the Catalog screen based on your search criteria. Small interaction pattern, but one that is magical.
One of the problems we have experienced with previous generation of Data Catalogs is they treat data and metadata as exhaust.
By this I mean other tools create the data and the metadata and the Data Catalog then “harvests” this on a scheduled basis.
We use a completely different pattern at AgileData.
When you do the data work in the AgileData App, or it’s automagically done for you by the machine in the AgileData Platform, the data work is immediately available in the Catalog.
This is because we flip the standard pattern for a Data Platform.
Instead of the pattern being:
- create Code > create Metadata > populate Catalog
Our pattern is:
- create Metadata > store in Catalog > create/execute Code
We call this our Config Driven pattern, others call it Active Metadata.
When you use the AgileData App, or the machine does something automatically in the AgileData Platform, every step in the data tasks is stored in Config. The Catalog screens are then just surfacing this Config.
With this pattern there is no way the Catalog can get out of sync with the Data and/or Metadata.
We then use that Config time and time again to generate and execute the code we need, before deleting the code. We treat the Code as Cattle and the Config as Pets in DevOps parlance.
From day one we also allowed the creation and management of Topic Tags in the Catalog.
These Tags are freeform, you can create a Tag with any name for any use. They are purposely not stored in Config as a set of restricted or regulated “lookups”. The reason is you often need to quickly iterate the use of those Topic Tags.
When you go into the Catalog you will see the data Tiles in the relevant row for their Data Design role, ie History, Concept, Detail, Event, Activity, Consume.
You will also see a row for every Topic Tag that has been applied.
Want to see a row with data Tiles that relate to “Sales” add a Topic Tags called “Sales” to one or more Tiles and that row will automagically appear in the Catalog.
The Tags are also “active”, click on a Tag that is shown on the Tile and the Catalog screen will Dynamically Filter to only show Tiles with that Tag.
This way of navigating to quickly find Tiles has saved many hours over the years.
The other part of the Catalog feature I use every day is the Menu Anywhere.
On each Tile in the Catalog there is the Menu Anywhere option, 3 dots (…). Click on this and it will provide the list of things you can do that relates to this Tile, for example show the Change Rule that populates it, or the Trust Rules that validate it.
This means you can punch out to the Change Rule Step screen in one click, without needing to go to the Rules List screen, search for that Tile and then click to open the rule.
Interaction patterns like Menu Anywhere might save a few clicks and just a minute or two, but those clicks and minutes soon add up when you do them day in and day out.
We also present the Catalog screen from different parts of the top menu that are again context aware. If you click on the History Tiles menu under Collect, you will be taken to the Catalog screen already Dynamically Filtered by Data Design type of History.
Again it’s only a few clicks to open the Catalog screen, type History in the search box, and then the Catalog is dynamically filtered. But saving those clicks, by just clicking on a single menu option, just makes sense. Especially when you do that interaction multiple times everyday.
The Catalog feature started its life as a feature built for Information Consumers, because that was what everybody else did, they brought and implemented a Data Catalog for their end users (and then wondered why they never came).
Over years of iteration we have made it a tool for ourselves to help make the data work we do less complex.
Remember we are We are building a product that is focused on:
- taking a data task that takes me an hour and making it 10 minutes.
- taking a data task that I have to think about how I can do it, and getting the machine to recommend what I should do, reducing the need for cognition.
- automating the data task so we don’t have to do it, getting the machine to do the work for us.
We have added a small bit of personalisation into the AgileData App as well.
The Catalog screen is where every Data Persona starts when they login to an AgileData Tenancy,
A Consumer Personas automagically starts with the Marketplace screen.
Keep making data simply magical
AgileData is focussed on removing the complexity of managing data in a simply magical way.
This is just one of the many features we have built to help reduce or remove that complexity from the data work we do.
Do more with less
We remove the need to build a large dedicated team of expensive data experts, by reducing the effort to do the data work and by doing the data work for you