Data Products

Aug 9, 2022 | AgileData Podcast, Podcast

Join Shane and guest Eric Broda as they discuss all things Data Product.

Recommended Books

Podcast Transcript

Read along you will

PODCAST INTRO: Welcome to the “AgileData” podcast where we talk about the merging of Agile and data ways of working in a simply magical way.

Shane Gibson: Welcome to the AgileData podcast. I’m Shane Gibson.

Eric Broda: And my name is Eric Broda. How are you doing?

Shane Gibson: I’m good. Thank you for coming on the show. And today we got one that I’m pretty excited about, we’re going to talk about this thing called “Data Products”. But before we rip into that, why don’t you tell the audience a little bit about yourself and your background?

Eric Broda: Sure. First off, thank you very much for having me. So again, my name is Eric Broda. I run a small boutique consulting firm, we have deep skills and experience in data mesh. And as you can imagine, data mesh really at its highest level is just an ecosystem of data products, the things that we actually want to talk about today. I’ve written about my experiences on There’s a probably about 13 articles give or take around data mesh, data products, done a variety of podcasts, conferences, and I’m actually one of the privileged few to who have actually implemented an enterprise. Data mesh had a global financial services firm. So I have a fair amount of experience practical hands on experience that I’d like to share today.

Shane Gibson: Excellent. And data mesh is definitely the hot buzzword of the 2020s. I tend to talk to people and say, I have a black hat and a white hat on it. I think the fundamental principles of data mesh are ones we’ve been striving for 30 odd years and data if not longer, and they’re just hot. So that’s good. And then I think there’s a bunch of principles and very few patents out there for implementing it. So lots of vendor washing, lots of dishwashing happening. But one of the areas that has been around for a while, and there’s this concept of a data product. And it’d be fair to say that the majority of the people I talked to struggle with defining what a data product is, let alone how to implement one. So you started for how would you describe a data product to somebody who didn’t know what it was?

Eric Broda: Sure. Data product is kind of almost a combination of two topics, as you can imagine from the actual term. So first off, it’s a merging together of this notion of data domains, and product thinking. So data domains are pretty self-explanatory, but it’s a related set of data. And the product thinking really, if I’d contrast it to project thinking, projects are short term, they start, they stop. But products have a long term time horizon. And I think a lot of folks actually understand products almost as well as projects in the sense that most of the clients that I work with, they sell products, so they know all about a product. So if you’re Nike, you have shoes, if you’re a bank, you have checking products. So product thinking, believe it or not, is actually embedded into the DNA of almost every company that that I work with, I think probably fair to say most companies out there. So it’s actually bringing those things together. Data domains and products and being able to be able to provide boundaries around data, have clear owners like products and make them actually discoverable so that folks can actually do something with them. So that’s really the core concept. It is packaging data by putting a boundary on it and an owner to it, and making it discoverable so that people can actually consume data, just like a product. So that’s kind of the simplest explanation that I would suggest is out there. But I’m sure it’s going to create a few questions.

Shane Gibson: Definitely. And so I agree on the idea of boundary. I want to deep dive into the difference between data products and data as a product in a minute, because I think that’s actually an interesting distinction we should make. But I’ve worked with data analytics teams, kind of helping them adopt an Agile way of working over the last few years. And one of the things that we were iterating with each of those teams was this idea of an information product. And the way I described it, it was a while ago, that we came up with that term and we’ve iterated on the templates but it was all based around a boundaries. And at the time it was a boundary of some data. We didn’t tend to use the word domain but I would now, it was a boundary of some code. And it was a boundary of some output that somebody used. So could be a dashboard, could be a report, that could have been an API, could have been a data service could have been a file. So being very clear that an information product wasn’t always a dashboard. Sometimes it was a different delivery mechanism, depending on the persona. And the reason we created that boundary was we wanted to break the work down from being a year into a small iteration or a series of iterations. And so we had to create some boundaries to time box that work. And so that started to having success as we found this term information product people can understand. And then we found 101 other uses where that that definition of a boundary were useful things like prioritization. And so effectively creating a roadmap of information products, what’s the next most valuable product, but our goal was to stop, six months of data design, get something out in front of the customers hands, internal stakeholder or external customer that had value and we got feedback on early. We didn’t go into the whole product thinking consciously. But when we started prioritizing and road mapping the information products over a time horizon of one to five years that was a form of product thinking. We didn’t start exploring the idea of data as a product, we didn’t really explore the idea of observability. Those kinds of features that made products easy to find and easy to buy. And so we know, we should be adopting that. So with that, I always kind of say at the moment for me, a data product is a subset of an information product, where the persona that’s going to consume it as a data literate person as an analyst, or another engineer, because in most of the descriptions, I see there’s a definition of a data product. We are going to get a domain of data, we’re going to create a thing that somebody else is going to access via code and use. And I differentiate data as a product as a way of working that product thinking that discoverability, they’re treating it as a thing that’s going to survive for a long time, or we’re going to kill it. It’s not a one off project. It’s not just a piece of code that goes to die. So what do you see is the difference between data products and data as a product? So do you not see a difference?

Eric Broda: At one level, I would say there’s it might be semantics. But I think the whole idea around data products enforces the product thinking. Now let me kind of give you an analogy around how I think of data products. And the analogy is going to be something that everybody kind of knows and loves in many respects. It’s Airbnb. So let me tell you why I think Airbnb is a perfect analogue for a data product. So first off, Airbnb brings together producers and consumers, those producers are property owners and the consumers are renters. So the first thing in a product is you have to have somebody who creates the thing and somebody who consumes it. And presumably you have some kind of financial arrangement where it’s beneficial for all concerned. Now, Airbnb has an absolutely clear boundary. It’s the legal structure. And it’s a service offerings that delineate its boundary. The interesting thing about it, it also has accountable owners of the shareholders who own Airbnb, and the legal regulatory frameworks that determine how it runs that’s the accountability of the owner. So Airbnb has clear boundaries. It has an accountable owner. It has an empowered team. Those shareholders have empowered the CEO and the full complement of staff to actually run your Airbnb. Now, Airbnb is also a platform and here’s the distinction that I make. It’s the data product, which is the clear boundary and the packaging of it, but the ability to surface it into a self-serve type capability. So how does Airbnb can you continue the analogy do that? Well, first off, Airbnb is known for ease of use, not just for consumers to find and rent properties, but for producers there’s an equally sophisticated user interface and capability and a whole set of aftermarket capability that makes the product, it makes the data product easy to use. In data’s case will be easy to find easy to consume. Next thing is a data product or a product in general has clear contracts and expectations. Airbnb provides absolutely clear guarantees of service payment and safety backed in fact by public contracts and insurance policies. So now the interesting thing about this and this is where I think the byproduct the value of the data product actually comes in play is like Airbnb bringing producers and consumers together creates that virtuous circle of market growth. More property listings bring more renters, more renters bring more property listing, so Airbnb is a fantastic analog for data products. Now, the interesting thing about it now to extend this into a data mesh, Airbnb actually is an ecosystem of Airbnb sites and locations geographically, each geography bound by its own set of legal restrictions, regulatory constraints, privacy, etc. So Airbnb is a product operating in an ecosystem of products. So now coming back to Airbnb as a data product if you will, or the analog for a data product. It has all of the characteristics that I just mentioned. When we move into the enterprise, so the key here is, everything that I said about Airbnb, you could derive, you could say, are the characteristics of a data product in an enterprise. So now, again, semantics data as a product or data product, I think data as a product suggests it downplays the intertwining between the data and the whole notion of product thinking. So again, with Airbnb, mapping to a data product inside an enterprise clear boundaries, data product has that clear boundary, has accountable owners, that’s the data product owner, and they are accountable and responsible for delivering the service and expectations, the clear contracts and expectations that they’re out there. They have an empowered team, they create a platform, hopefully, it makes data easy to find, share, consume and govern, okay. And they have producers and consumers. Producers typically when you think of a data product, in most enterprises, we think about the pipeline that feeds the database, that’s the producers. But data products bring together producers, the folks who create the data, whether there’s source systems or otherwise, that move the data into the data product, but it also addresses the consumers. And it actually has like any product. When you think about whether it’s a banking product, or Nike shoes, you’re out there to address the consumers interested in a data product, or product data as a product, I think suggests and emphasizes much more the ingestion of data into the domain. Whereas data products, in my terminology, it’s about the ability to bring producers and consumers together to safely interact and in some cases transact. So again, at one level, it’s semantic difference and another level data products provide the direct analog to things that we see out there today that can be directly applicable into the enterprise today. So again, I come back to analogies that make it really easy to understand. So again, data product is Airbnb, the principles of Airbnb inside the enterprise.

Shane Gibson: So I think for me semantics count. Because when we mix our toxicities when we use the same word for different things, when people have a different perspective, then we get confusion, and we get there within data domains, how do we create a boundary around the data domain? So I’ll come back to that but I take your point. I mean, your definition of data as a product was nowhere near my definition. It was like, that’s not what I meant when I said those words. So for me, data is a product is our way of working. So I will change my terminology. I’ll talk about either data product thinking or data product way of working, because it is those processes, it’s the idea of a data contract. It’s the idea of enabling the producers as much as the consumers. It’s all those things we do when we create this data product. And what I see in the market is lots of confusion because data products sounds like a thing. It sounds like a product on the shelf. , it sounds like your bottle of Coca Cola or Pepsi. Whereas a lot of the thinking we have is around how we produce that in a certain way and how we make it available for consumption. And I agree with you, I think Airbnb is a great analogy for the shift and where we’re going because that’s why we’re seeing such a big move towards data marketplaces. A place where you can present your products, you produce your products and you put them up for sale, even internally or externally, it doesn’t have to be for money. It can be to a bunch of internal stakeholders, and then people consume them, it’s effectively the Amazon for data. And we’re going to see that a lot. But let’s go back to that idea of producers. So what I see is a large amount of conversation around the consumers of the data products. How do we make it so they can consume that data? How do we make it so they know what it is? And one of the core principles of data mesh for me is the idea of decentralizing the data work back into the software engineering team, back into the teams that are producing the data. Because one of the major problems we’ve had in the data world for many, many years as people create data as a result of their application, and it becomes exhaust. And the data teams are kind of capturing that exhaust and trying to make it useful. And so that massive disconnect is where a lot of the effort and the problems actually happen. So we push all that work back into the producers, where they can actually produce data that’s fit for purpose, data that I can buy, data that I can use as if it was a product, then we solve a lot of those problems. But I see very little conversation about how we do that. Again, it’s all focused about data catalog, or consumption or quality of the data, all those consumption based things, not those producers. Are you seeing there or, are you seeing some good work happening and enabling people who don’t normally produce it to produce it in a better way?

Eric Broda: I would actually going to suggest a slightly different terminology. So the notion of the data product has that boundary. And the data product team, and the owners and the data product platform manages that data? So here’s my slight disagreement with what you’re saying is, is it is absolutely not about the producers and it is absolutely not only about the consumers. And in fact, I actually see much more emphasis on the producers. In fact, the terminology used is the data product focuses on the producers. And I think they’re actually separate entities and the data product brings consumers and producers together. So here’s what I mean to your point, who actually does the work? Well, typically, there’s a source system, let’s just you paint this scenario. I have a source system going into multiple sources and going into a data product, I have multiple consumers. The key thing here is the producing systems where the source systems or other data products are separate, unique and have their own owner and their own boundary. And if it’s done well, whether it’s through pipelines, or through API’s or queries, there’s a way to actually get that data, push it into a pipeline and ingested. The responsibility for ingesting, are you consuming that other source system or other data products capability is actually the data product team. So they own end to end getting the data in. Now, obviously, they collaborate with a whole bunch of other folks, the source system team are the other data products. But their responsibility, the responsibility of the data product team is to get the data, transform it as necessary, store it as necessary, and make the interfaces available to expose that data. So that includes the discovery as conceptually a slash discover endpoint. It includes slash consume endpoint, or more likely in an analytics world, the federated queries that are available to actually access that data. So the only distinction that I make is the data product owner and their team is self-sufficient. And they interact with other source systems and other data products to do their work. And they are accountable for making the consumption patterns, consumption services available to a consuming capability. So what I’m seeing out there is a very significant delineation in terms of responsibility. So most organizations do emphasize the producing pipeline that the data engineering team, and far too often, they build that as a shared service, they build all the pipelines that connect all the various data products. And what I find is the moment you go into the shared service capability, the centralized capability, what you end up doing is you introduce unnecessary bureaucracy, you slow things down and your agility diminishes, as it gets more complex. What we’re saying as a data product is I’m saying, all that responsibility is the responsibility of data product owner and the team. In other words, they don’t have, it’d be silly if they didn’t use vetted enterprise capability, but they don’t have to. It’s the prerogative of the data product team. They may say, what I don’t need data pipelines, and I’m gonna go real time and I’m going to listen to a topic. Now I’m going to get those things trickle feed them in real time into my data product. In other words, the data product owner has the wherewithal, this decision making rights and the capability to ingest the data any way they want. In other words, they don’t have to rely on a shared services team. But they may, if it makes sense, they put they own the capability and they determine how it actually works. The same corollary on the consumption side. Well, most times actually, folks will say, I just want to access via SQL, give me my SQL query and leave me alone. And today, actually, I’m gonna say five years ago, that probably was the only way that people went today, what I see is this notion of the real time enterprise, consumers are saying, queries are interesting and useful, but that’s not the way I want to do it because I need real time data. And I’m not going to issue this query, especially if it’s relatively complex, and I’m going to do an a 100 million by 100 million row joint, you’re going to be want to do that very, very carefully. What they’re saying is, I want to, and I don’t want to ping that thing and execute that too. I want to actually just listen for when the data in the data product changes. So the paradigm that I’m talking about is the accountability, the ownership and responsibility is in the data product team not in a shared services pipeline team or a shared services engineering team. Although again, it would be foolish for them not to use capabilities that make sense and to standardize, but it’s the prerogative of the data product owner and team to set up their data product any way they think is equitable, financially sound and make sense.

Shane Gibson: So I think you’ve covered a fairly a large number of perspectives, or lenses and hit definition. So I’m with you on a lot of it. So when Shopify published out quite a while ago, there ways of working, doing the Shopify model, one of the core tenants of that one of the core patents was self-organizing teams, and the teams were able to use whatever technology they wanted. And what they said was over time, teams found that rather than standing up their own technology and maintaining it, where they saw critical mass of the organization for technology they would adopt, because it became easier. The community was there to understand how to use it, the support was there internally, the bodies of knowledge were there and then over time, and teams broke off to become shared services teams effectively for that technology to support those other teams but there was a natural thing. As a boundary as a domain team, you could decide not to follow that path and do whatever you want, because you were empowered to produce effectively a product within the application. And for me, that is a very successful pattern. If you start off with your teams, as you scale and give them their right to be in charge of their own destiny, they will 9 times out of 10 do the right thing. And they will adopt shared patterns, because it makes sense to them. Humans are inherently lazy, we don’t want to do work we don’t need to do so that’s good pen. I think the other thing that I want to pick up on that is we need to differentiate team topology, our operating model, the way our organization or team hierarchy works and data products. So I’m with you still that data products is a boundary. So it’s a boundary of, who owns it? Who produces it? Who consumes? How do they want to consume it? What’s the data that’s in there? What’s the data that’s not in there? What’s the code? What was available, there’s a whole lot of patents that we will lenses, we should put across that they give us that boundary. But we can have a single centralized team that’s producing data products, if we wish, we just have to deal with the consequences of we now have a backlog that’s massive, because there’s only one small team, we can have a bunch of centralized teams, pod squads, whatever you want to call them, who each own set of data products or a domain, and that’s okay. That then we can have them using a shared platform or not using a shared platform. It’s all okay. I think we’ve data mesh, one of the things is a talks about the four principles, which is great. And then it brings in a fully decentralized topology of pushing all the producing work out to the software engineering teams. And I think that is where we should go to this Nirvana, but we’ve never managed it. But I think that’s also confusing the market, the difference between the principles of a data product and, and that team topology, I just want to jump on to the idea of data contract. So I did do a shout out on LinkedIn a while ago and said, Could somebody give me a template for a data contract? Because I didn’t have one and I actually needed to put one in place for some stuff that we were doing as part of our startup and I’ve got a couple of examples but not a lot. So the idea of data contract, I take your analogy. The Airbnb is a contract for its producers in a contract with consumers and you sign up to it, it’s well formed. You might not like it, but it’s still there. But we don’t have that. We don’t have patents for data contracts widely available in the in the data world, or have you seen it differently? I mean, have you seen some really good data contracts out there?

Eric Broda: So I think it is a gap today, but let’s kind of unpack that a little bit I suppose. When I think of a data contract is kind of minimally three things that you need to think about. One is obviously, there’s a tech set of expectations contract. So for example, SQL would be and that’s one way to, and it has its own set of contracts and expectations. You may use API’s use open API specification. So the tech side is one element of it. There’s expectations around change, for example, that there’s an expectation that what works today will work tomorrow, and that there’s some level of backward compatibility. There’s also hard things like SLA (Service Level Agreements) or expectations where 99.8% of the queries will happen within two seconds, or whatever that may be. So I have not seen anything that spans all of those to be honest with you. But what I would say is, people are starting to understand that when you look at a data product, it is the vehicle to bring all those things together. So I’ll give you a simple example. At some of several of my clients, we’ve implemented this notion of a data product registry. And what the registry does is, it’s a prototype at this point in time undergoing some industrialization may one day be product, who knows. But ultimately, what the registry tries to do is, and it’s different from a catalog, and you’ll see why, like, it’s not a colibra, it’s something very different. But what it tries to do is say, if I want to go and find the data, I should be able to have some way of actually searching for all the data products that are out there finding the one I’m looking for double clicking on it. And I should be able to see much more than just the database schema and the tables in the columns and all that, even there you should be able to have which columns are sensitive, or whatever. But it shouldn’t say, who’s the owner? So who do I call if I actually want to get something? What are the SLAs? Now here’s the real thing, what are the actual access mechanisms? Now, here’s when you boil down data mesh, and kind of what I do for a living data mesh is a set of principles, data products being the architectural quantum, what I try and do with my clients is I turned the practices in the principles into practices. So this notion of a data product registry actually crystallizes it for all it’s like when you think of Airbnb, and you go and search for a rental property, you can put any level of filtering geographic location, price, etc, and hit go and it gets you this beautiful thing and you can find properties anywhere in the globe. Same idea with a data product registry, I should be able to have that simple and interface, and I should be able to navigate from the forest, find the tree I’m looking for, drill down, find that , of the forest of data products, I should be able to find the data product that I want. And I should be able to now look at the service level agreements, the hard agreement, I should be able to look at the service level expectations backward compatibility, etc, we promise not to break things or whatever. And I should be able to look at the tech expectations. Again, if I look at the analytical space, if I’m trying to do a join of 100 million row table with another 100 million row table, which you think is not that frequent, it’s actually more frequent that people think about, especially in financial services, you don’t want to have people doing that just at the drop of a dime. You want to have those queries vetted, performance tested, all the data has been scrubbed for all the regulatory and privacy concerns. In other words, you want to have vetted queries. So that’s another set of capabilities that is exposed in the data product. In other words, the data product is the container that has houses the data, but it also has how’s it all of the formal and informal expectations, service level agreements and access mechanisms all available, that in effect is what I think of a data product contract. There’s nothing out there that has that today, although some of the work that I’m doing on my clients, we’re pretty darn close to instantiating that, but that’s what I think of the contracts. There’s no vendor product out there today that does it. There’s no formal spec that does it but if you’re able to think through it a little bit, you can actually piece the puzzles, you can actually put the puzzle piece together so you can actually see something that makes sense.

Shane Gibson: So I really like that. I haven’t heard that lens before. So one of the things we know we do badly in the data world, we confuse people who aren’t data literal. We talked about in the old days, ECD, too but now we talk about mesh and fabric and API’s and all these words that consumer of a product doesn’t care about, they just want the product. So I liked that idea that we can use consumer based terminology to describe what they get. And as I was thinking of those words, were using that’s effectively one of the things that you were talking about as the warranty. When I buy a product, I get a 1, 2, 3 year warranty, I get a promise of how long this thing’s going to survive. So I heard that form of a warranty. One of the other things, whenever I’m coaching a team, I get them start off first is what I call definition of done. And it’s a little bit different for the data team then what typically you do it and Scrum. So, for me definition of done is not what the product owners asking for. It’s the set of criteria that as data professionals we expect to do when we build data products. So our consumers, our product owners expect us to test our code. They don’t expect to have to tell us to test our code, they expect us to validate the data, they expect us to make sure that we haven’t lost any, there’s a whole lot of expectations that you would expect a data professional to do. And that for me is that definition of done. And I can take those definitions, and I can then think about, if we had this registry of data products, those should be things that we have green ticks on, we have done these things to produce this product. There’s almost like a QA process on a factory to say it’s passed all the tests for that product to go out and be boxed and shipped. So I like that. I like that analogy. I think we’re gonna see a lot of confusion between registry and catalogs because they are similar, but they’re not the same.

Eric Broda: Well, let me let me just, I want to kind of touch on that real quickly, because you did mention it. First off, there is confusion in the market. That’s why I purposely called it a registry. In point well taken, maybe I’m not adding clarity, as I do that. But let me let me explain why I did that. So first off, let’s start at basic principles, what is a domain? So typically, the domains are defined by the Chief Data office in a typical large organization, but they’re macro level domains. And they’re quite granular, coarsely grained rather. The practitioner wants to have something that is finely grained, something that I can use. Well, lo and behold, the data product is exactly that. It’s designed for consumption. And production, ingestion of the data should follow the consumption of that. So first off domains are different governance domains, which is their traditional data domains, an organization which is different than a data product, which is a practitioner focused. Again, the audience of the data, the data product is the developer, the data scientist, and the analyst who has a practical job to do. The job of the data domains at the CEO level, are to ensure that we’re doing the right thing with our data, setting the policies that the data product owner and their team actually have to implement. So by its very nature, data products have a much finer grained domain. So now if I think about applying that to data product catalog, the confusion comes in when people think of a data catalog. There’s products out there like colibra. And colibra is a great product for what it does but let’s make no mistake about it. It’s a governance tool, independent of how people want to modify, change it, use or abuse it, it’s a governance tool, and it’s designed that way. The data product needs something that is a practitioner tool. So for example, let’s come back to the things I mentioned earlier. Colibra definitely can provide the schema of the data product, probably can actually identify the owner. Sometimes you may be able to define the service level expectations but it’s a little bit ad-hoc. It largely is agnostic to this notion of expectations versions backward compatibility because it doesn’t talk about the practical implementation. Once you go into the implementation, what are the API’s? What are the queries? What are the events that I’m listening for? That’s nowhere to be found in the governance tools. It’s a different problem you’re solving. We’re solving the government that colibra is something for the CTO and the governance team. And it’s a super valuable product for them, great product for them. But we shouldn’t confuse that with what the practitioner needs. The practitioner needs very practical. How do I actually consume this darn thing? What are the vetted SQL queries? What are the API’s? What are the events that I can listen to? In addition to the expectations and the SLA stuff I mentioned earlier, so I call this thing that exposes that information. I call that the registry. Now, we too often think of the registry as just a user interface. The registry is two things. It’s a user interface. And it’s, for example, the slash discover endpoint, the slash observe endpoint, the slash usages endpoint, the slash logs input endpoint. It’s the slash alerts endpoint, each of which, for example, alerts with what happened with last night? What are the bumps in the road? And can I actually see what actually occurred? So I can look at the alerts, I can also look at the logs to maybe do some diagnosis. I can look at the usage to say maybe there, we had a huge amount of new users that came on board. And that’s why the performance was poor. So I can do a slash observe to see what else happened in my data product. All those things are wrapped around this notion of a data product. And I call a registry, not only the user interface, but that set of slash discover slash observed endpoints that make it machine readable, if you will. So that’s kind of very distinctions that I draw.

Shane Gibson: So I won’t go off on a rant about governance by committee and how invaluable it is. So again, I can take a whole lot of other words and use them to describe what you’re talking about and they all have values. So you talk about a lot of data ops behavior, that idea of logging, that idea of affordability, that idea of monitoring and exposing performance, anomaly, alerting, all those things that we should do for a data product, they’re all valuable. I mean, we have a massive problem at the moment that the markets, unbundled everything except for the database. So always confused that we say we’re in an unbundled world but everybody uses snowflake, which is highly bundled. So we can see that. I think the word registry, for some reason, whenever I’ve heard the word registry, I’ve read the word “Protobuf”, which I don’t actually understand what it is. And I’ve never bothered to go and look, but it just sounded like a technical word. But I do remember a data kit, I really liked data catalog tool that was out there for a while, I think it was the one that click bought and embedded into their product. And I think they’d lost this feature. But what it did, it had this feature that I thought was quite interesting. And the idea was it treated. There were a bunch of tiles, and it looked like Amazon effectively. And so each one of those tiles was effectively a dashboard. It would be a form of like form of data product right ahead of a boundary of data that had a boundary of code that transform that data, and I had a way of delivering that data typically in a dashboard. And then what you could do, if you wanted access to it, you dragged it to a shopping cart. And when you hit “Buy”, it actually sent off a request to the owner of that data, the person that was recognized as the owner or the steward, and then they approved your access, and then did the change for you automagically, and I like that. I like that analogy, because it was like me buying something from Amazon. I could treat that piece of data as a product to get access. So I’m with you, I think this idea of extending the paradigm or the pattern of the catalog from just being governance, or just been seeing the columns, or just being identifying the PII data to having all those lenses, all those things that a good product should have.

Eric Broda: So to build on that, I’ll give you an example of that at some of my clients, what we’ve done is we’ve looked at it and said, not only do that does the data product registry have all the capability we said, but suppose you lo and behold, somebody actually finds the data and they want to actually access it. What do they do? Well, in the old world, you look through your Outlook organization chart to figure out who insecurity may be able to call. Hopefully you get the right person who will refer you to the another right person who may be the right person that’s on anyway, we try and avoid that telephone tag by allowing folks to actually be able to create an “Access Request”, attach it to the registry, facilitate that. So I can now so I’m just giving you a few of the highlights, but there’s a lot of process capability that you can layer into this data product registry. That makes it again, the mission I have is to make data easy to find content to share and govern and getting access to data is fundamental to actually achieving that. Now that what the registry also can do is our goal was recognizing, again, our data products would be relatively finely grained. We want it to be able to spin up just like if you want to cloud instance a VM or whatever, an Amazon or AWS, it’s all user interface, a few clicks, and lo and behold, so part of what we wanted to do is we wanted to say, we should be able to have a step by step user interface guide that walk them through spinning up a brand new data product. So you’d sit there and say, here’s the owner, here’s some of the Postgres database, here’s the URL for the Postgres, there is a configuration for Postgres, and hit go. And it’s created automatically the slash discover, slash observe, slash usage endpoints for you. And you filled in a little bit of data, and literally, within five minutes you could have a data product. So we’re going with it is that the registry is, again, the mission that I have is make data easy to find. So that’s the search capability, make it easy to consume, as I offer the list of API’s, etc. Make it easy to share, the API’s the SQL and stuff does but it also makes it easy to govern and operate. So being able to find the data and create an “Access Request”, being able to spin up a brand new data product in five minutes or less that’s all achievable today. And we’ve we bundled that capability, right or wrong with the data product, because it comes with all the benefits of having clear boundary and owner and a team associated with it. So it just made eminent sense that that’s how we actually structured it. And I think that’s where I’m fired up, my crystal ball gets kind of foggy after I think six months ahead, but I think that’s where data products are actually going, they’re going to focus on the developer, data scientist, analyst user experience, and be able to make it easy to again, find, consume, share and govern data. And that’s what the data product registries mission in life is.

Shane Gibson: Yeah, and so I’m gonna go back to my definition of data product versus information product. So you gave a very good description of a data product. You talked about the persona, as the analyst as a data scientist, you talked about the data experience that they wanted, find some data stand up, the employees get access, and that if I talked about Chief Marketing Officer, their experience has to be different, or they want an information product, maybe some do, but most of them don’t want to know about an endpoint. They just want to know that they have some data that consume, they want to view of it, they want to consume it slightly different. So their experience has to be different. And that’s okay, we just got to be clear that, that changes the boundary, the boundary for a data product that is focused on an analyst or a data scientist, or an engineer, is a different boundary, it’s focusing on a consumer that is more at the business level. And that’s okay, there are two different products. They may use the same data, they may use reuse the same platform, they may reuse a whole lot of the moving parts, the Lego blocks, I call them, but they’re a different product, because we have a different customer, and the experience we give them should be different. So for me, we’ve got to bring in the persona of the consumer of that product as one of the tenants of the boundary we create when we define that product.

Eric Broda: Absolutely, 100% agree. Again, it’s pivoting from a producer, how do I how do I find the data and get the data into the database from pivoting and saying, what’s the consumer need? And I’ll go get the data and make sure I can transform it to address the consuming need. It’s slightly different perspective but I agree 100% agree with that.

Shane Gibson: Excellent. So let’s go back to data domains, because this is a tricky one. So I was intrigued that I’ve never quite thought or seen the data domains coming down from a CDO, I think in New Zealand, we don’t have a lot of CDOs, but it’s still relatively new to us. And if we do, they don’t really act like CDOs we see overseas. So what I tend to see is there’s a couple of lenses with which data domains get defined. So the first one is, it’s Conway’s Law right in terms of everything will behave the way the organization does. So, the organizational hierarchy will define the domains, these marketing, these sales, these finance, these HR or people. I see another lens which is core business process, if we have a supply chain, if we have an HR leave process, those kinds of boundaries where there’s a beginning and an end date to move through it to achieve some cool business processes. I can see those often become domains. Another way to do it is to think of it as concepts. We have the concept of a customer and the concept of a supplier and the concept of an employee and their domains, we’re gonna get lots of cool business events. It’s going to span across multiple organizational hierarchies or silos and that’s our definition of a domain. Those are the ones that I typically see and I saw specific silos, we have Salesforce as a domain. We have HubSpot, that’s a domain. We have SAP, Oracle finance, there’s a domain. So those are the lenses I’ve seen, what do you see? What do you use to get a definition of the data domain?

Eric Broda: Yeah, sure that the very first one you mentioned. I’m very glad that you mentioned Conway’s Law. It is prescient in terms of how true it actually is. And it was created by Mel Conway in 1960s, or something like that, but it’s been so spot on. Here’s the lesson learned. If you don’t follow Conway’s Law, you’re going to be swimming upstream. And like any buddy swimming against the current, you’re gonna get fatigue. And in terms of projects, or the longer term funding, you get funding fatigue. So therein lies the problem with shared services. And centralized organizations, who is the owner? Don’t know. But it’s definitely not the folks that have the money. And it definitely doesn’t follow the organizational decision making tree, which is really the org chart kind of shows that. So going against Conway’s Law is, you really got to think through why you’re doing that? And I think today, if you think about why white people created shared services, because the skill set was very hard to find, or extremely expensive, and you wanted to share it. Most of the technology that we’re talking about today is not in that camp anymore. Now, maybe it’s not too easy to find an AWS Engineer, but there’s no need to have a shared services built around that any more today. So first off, Conway’s Law, absolute fantastic hint for where your data products are. Second thing is a business processes, I tend not to find those because what they do is they actually swim across organizational lanes. So they kind of violate Conway’s Law in some respects. That’s not to say it doesn’t work. But it has to be a super important process for it to actually work well. And you have to have an extremely strong owner of the actual business process to make it fly more often, what I find is something that most organizations are many of them do, least larger ones, is a business architecture. So it’s kind of I’ve seen hundreds of these things, but their PowerPoint were, but they’re very useful in the sense that they identify the domains. Now, they happen to be usually one level below the governance level domains. So for example, client is a governance. Client is so abstract, can mean anything to anybody. But a business architecture would probably go down and say, in this geography, here’s what a client actually means and here’s all the other capabilities. So business architecture actually is the next level of granularity where I look to define my domains. Fortunately, most of them line up to the organizational structure again, Conway’s Law coming in. Now, there’s the interesting thing I find is, depending on what industry you’re in, I’m in financial services. So there’s a lot of pre canned models that give you tons of hints. So BIAN is one that I’ve seen many times before. Its financials, it tells you the business architecture for an internet or global banking organization. I’ve seen things like Teradata’s financial services data, logical data model, the data model is interesting as the fact that it has this huge glossary of all of the business entities, which again, are organized typically around the organizations that you typically find within an organization. And again, the last one, and perhaps the hate to say it, but sometimes the least useful is the domains that the chief data officer comes in defines. And again, it’s not that they’re not appropriate. It’s just they’re addressing a different need. They’re addressing a governance, which is absolutely crucial. Don’t get me wrong, it’s needed. But it’s too far removed from the practitioner that it offers some useful hints, but you have to go several levels deep down into the organizational structure before you can get something useful. So to be honest with you, it’s one of the very first questions that always get asked that my clients which is I love this data mesh thing, and I think these data products are fantastic. How do I find one? So there’s no, I’ll be honest with you. It’s a little bit of my secret sauce, I suppose my consulting business, but it’s not necessarily rocket science, there are some pretty solid hints that if you’re aware of simple things like Conway’s Law, you can find your data domains, your data product domains, or data products rather, a lot easier to do anything.

Shane Gibson: Yeah, it’s interesting, isn’t? I’m now worried that we’re going to see the enterprise data mesh model, so I remember the days of Oracle, IBM, Teradata, where you could buy the banking model. And everybody was proud of how many loaves of paper you could put up there for that model with typically the horrible party as a party of a party. And it was a very expensive piece of paper that was very hard to implement. So for me, I agree with you that first thing you should do is observe the organization. So typically, if I observe an organization, I hear the term value streams, or customer journey, and then I keep observing, and I see the organization actually behave that way. It’s not just lip service, then that idea of having a domain based on core business processes has possibilities, because what they’re saying is they want to work on that end to end way. But if they don’t, if they have organizational hierarchies, then you need to bounce your domains to that. I think the other thing is, and it’s something I haven’t really been thinking about, but I will now that’s probably one of my big takeaways for this is. As I said in the beginning, when I was working with organizations that we defined an information product, our goal was to break the work down into smaller chunks. That was, how do we take this lifetime value thing that we know is 12 to 24 months’ worth of work? How do we break it down into smaller parts? How do we break it down to say, well, actually, there’s a definition of revenue that we need to do. And there’s a definition of expenditure, and there’s a definition of margin, and there’s a definition of profit, there’s a definition of churn. There’s a whole lot of these, like Lego blocks that we need to put together to do lifetime value. And each one of those takes time and effort, they’re not easy. So how do we break that work down, and that’s the kind of one of the first rules of the boundary was decompose it down into small iterations. As a result of that, we found some value, we found that we can define a canvas which took us 15 minutes to fill out. And we found that actually, the Canvas was so simple, because it was based on the business canvas, which is a thing of beauty, that actually the product owners would be able to pick it up and do it themselves. So they bring the Canvas prefilled out to the team and start the conversation with them. Then we found out that we could use that Canvas that that boundary to define some form of value that could be prioritized. So we started talking about, okay, if this if this product was delivered, what value was there? And everybody went all blah, blah, blah. It was all became very high level increased revenue, better customer experience. Well, that’s not working for us. So we asked questions of what was the action you’re going to take. So then it was like, well, this is a tune product, what we’re going to do is we’re going to make an offer to the customers that are expected to tune in the next month. And we expect to retain, certain amount of revenue and increase retainers and amount of margin as a result of that. So cool, there’s the value right there action, you’re going to take and forms the value. So if we think about actually creating a checklist, almost a lens of what the drivers of the boundary are, that would be valuable. So you talked about binding the warranty, the definition of done was not with us, but the words I’m using, but that SLA that warranty, their definition of done that expectations and the technology, how we can actually say you’re binding that to the boundary of that product, what else could we bind to it?

Eric Broda: So first off, I think you’re hitting on a key point. The question, obviously, is what is the boundary of a data product? And here’s the answer. It’s not the tables but the tables are important. I can point to them. But the key thing that defines a data product is what you send back when you hit the slash discover endpoint. Whatever you send back, that is the data product. So if I send back the technical contracts, that’s how I consume it. If I send back the expectations, the version, I promised not to break over, the backward compatibility, etc. that’s something that defines the behavior of the data product. If I have the SLAs that sets the expectations around the operating characteristics and performance of my data product. So the key thing here is the definition of a data product and the boundary around it. The key thing is, that’s manifested and realize the moment you actually expose the slash discover endpoint. Because here’s how the static discovery endpoint gets used, I use the rest terminology but I think we all know. When you hit that endpoint, you’re gonna get all that data back. And you’re gonna surface it either to a machine that eventually surfaces it to a person, or you’re going to provide that that UX, the data product registry UX that I mentioned, or it surfaces in something like it sends it to a culliver or otherwise. But that’s the boundary that actually is the definition of the boundary. And here’s the thing about it, is you’re formalizing it in black and white, you’re formalizing it and that’s the key. So here’s the difference. Here’s the difference between the data product and the data domain, is even if they’re granular enough, they’re abstract. It’s raw data but it doesn’t define any of the SLAs the expectations, nor the technical consumption patterns. So the data domain is not sufficient to turn into a product, you have to have those other things, the behaviors, the service levels, the expectations wrapped into it, that’s the thing that defines the boundary of a data product. And that’s why this this whole notion. Far too often, we think, only a data product of how do I get data in and how do I get data out, and those are critical? Absolutely. But I would argue the single most important capability that a data product has is it can tell you about itself. It’s discoverable. The second most important characteristic, it’s observable, it can tell you what’s happened dynamically in it. That’s the things that practitioners need. That’s the thing that actually makes a data product real.

Shane Gibson: So when I’ve been doing it all been coming from the beginning. I’ve met the requirements stage. Like I said, I’m trying to do that. And if I think about it on the canvas, there is a box typically of SLA. Because we want from a requirement point of view, we want to know, do you want this refreshed every second, every hour, every day, every week, every month? Because it changes the way we build that product. It has an owner who’s the person that’s going to make the tradeoff decisions. When we say, you can define, we could do it this way, we could do it this way. Here’s the cost and consequences, who’s going to decide the tradeoff decisions for that. So they were all on there. What I hadn’t done is made the jump to say actually, if we then codify that at the end, made all those boundary decisions quantifiable and discoverable, there’s massive value there. And so I’m with you, I think there is a bunch of ways we can define what a data product or an information product boundary is. And then as you’re talking, I’m thinking, actually we can get to well-formed boundaries on those products. Now, we’ve got a problem with domains. Because they’re bigger, they’re more macro. And why do we want a domain? Well, we want a domain, so we know who’s going to work on it. There’s going to work in that domain, there’s going to be the subject matter expert, when a product gets requested that fits in that domain, who’s going to do that work? It’s an organizational boundaries, I think topology. So for me, now I’ve been linking. Well, what we do is we define boundaries for the product. The lenses we’re going to use to describe that this thing is unique and it’s different from another product. And we can describe that will describe those early so we know what to deliver. We’ll then codify the delivery of those boundary parameters back via the registry. So I can discover and see how they are different. And then we use team topology to figure out where the domains are. And they may be the organizational hierarchy. They may be the value streams, they may be core concepts, they may be something else. Maybe hippo, the highest paid person in the room actually decides where the boundary is because they get all the cool stuff. But those domains really are just a mapping of when a product turns up to be developed to be built. Which team, which group of people are going to work on it? How do we know which domain of fits? And that’s a good model in my head because it answers some questions. But it comes back to you have to define the data product boundaries really clearly beginning, and you have to represent them at the end, that’s the key. And that’s actually easier than defining this big nefarious boundary of domain and my head, that’s been enlightening. So if we think about product thinking, like I said, the idea of an information product and a bunch of canvases actually helped prioritization and the organization’s I work with so that was kind of an undiscovered, unexpected when wrote was like, Oh, we can actually use this and it’s been very, very successful. What other product thinking behaviors have you seen that this idea of data and information product allows us to adopt that has value?

Eric Broda: Maybe I’m going to repeat myself, but the single most important concept around again data, data boundary and data owner but the next one that in the hierarchy of important And is it actually brings together, consumers and producers. And that pivot from focusing on the consumer what they need to identify the data and the domain that we will put into this data product. And then using that to find the producers that create that data is a very, very different pivot than what we have today. So if you think about how does a typical enterprise data warehouse get filled up? What happens is, it’s a big landing zone. It’s a big honking set of data. And I have links probably too from hundreds of different source systems and have these pipelines coming in. And fundamentally what ends up happening is I lose context, I focus on the producers. But the consumers are so far down the pipe that I can’t actually draw the line necessarily, or at least not easily between what this consumer needs to meet this particular regulatory thing or enter into this type of specific market, I can’t actually draw that line to the producers. So what we have a very disjointed experience. The thing of the data product forces is it starts with the consumer, which is served by a data product, and a data product owner and their team. And then it derives the sources of the data and ingest those transforms as well as necessary to serve the needs of the consumer. So it’s a very bit different pivot from what your typical analytics organization is structured today. And I think that’s kind of the organizing principle around a data product is consumer first, producer follows function, based on the consumer, the producers will be self-evident. Instead of the other way around, create the producers put them into a data lake or a data mart or a data warehouse, and then hope that theory is if I put enough data in there, somebody will find use in it, that’s typically not, you can do that for a short period of time until the cloud costs start to add up. That’s not the typical way that you want to do things build it, and they will come works great in the movie, but not in real life, we really need to get to the point where we pivot from what the consumer needs to drive, particular going into a particular market, addressing a particular regulatory need, driving revenue, optimizing whatever the case may be the consumer is the driver for organizing the data product, the owner in the team and what they do. That’s the pivot that anybody who’s come from an analytics, organization, shared services, enterprise data warehouse type background, so that is completely foreign in many respects to what they’re doing today. That’s the big aha, that I think a lot of folks have.

Shane Gibson: Yeah, I agree with a traditional data warehousing why we used to do it, which was build it and they will come, keeps us very busy. Team was always busy working on data, but we had very few customers. So I think, their idea of bringing products thinking and bring customer first, I think we got a little bit careful that we don’t move to the one report one customer paradigm. So I think it’s that horrible balance, which we got to go to. And that’s where the idea of a data and information product gives us that boundary, that it’s reusable. It’s not just a single report for a single person. It’s something that should be reusable, but picking up that product thinking of engaging with our customers early. So prototype those products and get them in front of a customer for feedback before you go through another iteration on them don’t though the whole thing and then put it on the shelf and wonder why nobody bought it. I think the other thing is, we’ve seen a couple of waves in the market over the 30 odd years I’ve been doing it. And we’ve seen the analytics engineer wave right now, which smells very much like the self-service reporting wave we had a few years ago. And the way I articulate it in the days of Tableau and click and when they all came out, we empower the analysts to do the work without the engineering teams. And that self-service have massive value to the organization. But what we saw was we saw lots of levels. We saw a lots of siloed little bits of work being done and it became chaos. You ended up with 3000 Power BI reports, all coming off spaghetti. And my view is we’re seeing that with DBT and analytics engineers now. We’re seeing a whole lot of what they call models, but I’d call code to create a single table, no shared reuse, and if I put my product thinking head on, we’ve now basically build 1001 products that were put in the warehouse or in the shelf somewhere and we can’t see them anymore. We don’t know when they’ve gone degraded. We don’t know when they’ve gone rotten. We don’t know which ones have been you who’s bought, we don’t know which ones aren’t, we’re still trying to maintain all these supply chains for these products, whether somebody’s using it or not. And so we’ve got to bring that thinking back in there, a product has value. It has value when it’s used by a customer or a consumer. If they’re not using it, then it has no value. And you typically, if you take product thinking from another domain, you cancel that product line, you don’t keep producing something that nobody else is consuming, because it’s not viable. It’s not valuable anymore. So in terms of that registry, that discoverability, or those metrics about who’s using it, when tells us when a product is gone stale, and we have to be willing to kill the product. It has no value anymore, it’s been replaced by a better product ideally from us, but maybe from somebody else, maybe we’re gonna see internal data product marketplaces in external ones. So there’s a big chance now that another company or another vendor will sell a data product to your consumers who are your internal consumers, and they find better value in it, and that’s okay. And that’s actually one of the things I wanted to go back to. So you talked about shared services, I still see centralized teams and shared services has been valuable, depending on your organization. But the key change we have to make is we have to decide that that team is no longer funded based on how many people were on the team, they funded by the products they sell. And the teams that I’ve seen really successful are the teams that have a natural salesperson and the team that’s out talking to all the consumers in the business and selling them something going, it’s like a little crack, do you want a little sniff of this data? How about I’ll give you some counter customers, or that was good, wasn’t it? Everybody give you a counter customer by store, and you need that. We’re selling products. We have to excite our customers and make them want to blow them off us effectively.

Eric Broda: Yeah, absolutely. When I think of the shared service organization, if you want to have anything done, it goes through a quote intake process. And that intake process, you may actually, if you’re lucky, you have visibility into it. But more often than not, you don’t. And magically, two weeks later, somebody may call you. So the problem with shared services is they tend towards bureaucracy, they tend towards this notion of intakes and such, and they remove the engineers from the visibility what the business is actually trying to do. So I agree strongly with your notion, I mean, there’s certain things where skills are extremely expensive or rare, you have to have a shared service. But when you do it, the ideal way to do it is remove as many obstacles to talking to your customers as you possibly can. And then you have a chance of first off understanding what the business trying to do. And you actually have an opportunity now to prioritize your work to address what the business actually wants, as opposed to what the shared services organization may do, which largely is to be honest with their focus is in some respects, cost optimization. The business to this day, today, every single business I see, and all the industries, it’s actually about speed and agility, time to market, getting stuff out there is what it’s all about. So that is countercultural to the shared services organization. You have to figure it. Every organization has to figure out, are you optimizing for cost or, am I optimizing for speed and agility? Hands down, if you were to talk to 100 business executives, 99 to 100 of them will say I’m optimizing for speed and agility, and if you can give me time to be able to market I’ll pay you just about anything, you need to do that. So the shared organization thrives when it does that, and it continues to have to justify its existence when it doesn’t do that. And as Conway’s Law again comes into play, if the businesses are or it’s a shared services several steps removed from the business, the funding and the decision makers don’t understand why they’re getting paying as much as they do for what they’re getting. You have to know that if you’re in shared service, and you have to do it, you have to know your business. You have to reach out there. Every single person like you said, Shane is a salesperson.

Shane Gibson: The first thing is shared service group does is put a ticketing system in and you think in the consumer world, what organizations make us use a ticket are the Telco’s, we log a call to get our phone turned on and it takes two weeks, or maybe takeaways where we go in and it’s busy and we get a ticket, but we get the product pretty quickly with its tickets. Actually, we can see the queue. We know where we they’d yell out number 99. And we know we’re 1, 2, 3, we can estimate how long it is. And we know we’re next, there’s no magic 104 never gets yelled out before, 103 was bad things happen. So I think you’re right. They’re not focused on the customer, they think of it as not even a service. It’s optimization of the service. Like you said, it’s not a product that I use it products thinking. So shared services was all groups of people working together for other groups as okay, but you still have to treat the other group as a customer and properly as a customer, not as a series of tickets and that’s the patent that we should adopt. Look, we’ve covered a lot in the end that’s been great. Just to finish off, if people wanted to get ahold of you and talk to you further, what’s the best way for them to do it?

Eric Broda: I have email. So it’s And feel free to reach out on LinkedIn, it’s EricBroda. All one word. And if you’re inclined and want to see some of my articles, it’s And I’d be happy to respond to any emails and LinkedIn questions as a result of this. So I just want to close with kind of one thing Shane. We’ve talked about data products. And I’m absolutely convinced that this is the way that we’re going to be structuring our data organizations going forward. And here’s kind of why. I actually like the name of your podcast AgileData. In here’s why, the thing that I pitch is Data Agility. Data Agility now at one level, it sounds kind of trivial and trite and in some respects, it may be. But here’s the thing is, we’ve been here before Agile turned into DevOps turned into DevSecOps, and it changed the way we deliver software, data was left behind. Now we’re pivoting from shared services data organizations into data products, easy to deploy, easy to build, all that kind of stuff. And they deliver exactly what the business is looking for it’s agility. So we’ve been here before, and we should definitely learn from the Agile practices, and the approach they took. Second thing, businesses are pivoting from cost to speed and agility. The Shared Services team, except for rare circumstances, their days are numbered, it’s not gonna happen immediately. But the last thing I would say is, as you go about your data mesh or data product journey, architecture 101 still matters. And you can get yourself in a lot of trouble by architecting things poorly, and data placement 101 still matters, you need to think that through. And the way that that I recommend folks to do is this data mesh and building the ecosystem. It literally happens one step at a time, one data product at a time, which really means you got to frame your journey in terms of incrementalism, and test and learn philosophy. If you do those things, I think your odds of success are very, very high.

Shane Gibson: Yeah, I agree. We’ve seen these waves before. And we’ve seen some work and we’ve seen some fail. So one of my key sayings is Agile is not ad-hoc. We don’t just make shit up. So there’s a lot of patterns out there. There’s Lean thinking, there is Scrum, there’s Flow, there’s Product Thinking, they’re all slightly different, there’s DevOps and it’s different to all those. So we got to figure out which pattern we want to adopt. And it’s a paradigm shift we’re trying to go for, it’s not a technology change. So we need to share patterns. If I think about why is Scrum being so successful? Well, the fact that there’s five ceremonies that everybody who does Scrum can list off the top of the head what they are, it’s a known pattern that most people who do Scrum adopt those five ceremonies, it’s a proven set of patterns. And it’s one of the reasons it’s been so successful. So whether it’s data mesh, or data fabric, or data lakes, or data warehouses or data whatever tools we’re going to use, it’s the patterns under the covers that people want. It’s the things that say, if you do it this way with this context, then has value. So lots of patterns you picked up, the idea of treating a set of things as a warranty against your data product and exposing it to somebody so they know what they can trust and what they can’t. That’s a really good pattern that we can adopt from another domain, another profession. So exciting times. Let’s hope we don’t go down the big data problem. Everybody thinks it’s just a bunch of buckets that you put crap in and that’s going to magically do it for yourself. Let’s hope we do the hard work to make this one stick and actually have value and hope if we do. So look, thank you for sharing those patterns. It’s been really good and we’ll catch you later.

Eric Broda: Well, thank you very much, Shane, once again for having me on the podcast and I hope we have a chance to talk again soon.

PODCAST OUTRO: And that data magician was another AgileData podcast. If you want to learn more on applying Agile ways of working to your data and analytics, head over to