Data Mesh Podcast – Finding useful and repeatable patterns for data
TD:LR
I talk to Scott Hirleman on the Data Mesh Radio podcast on my thoughts on Data Mesh and the need for resuable patterns in the data & analytics domain.
My opinion on Data Mesh
I am not a fan of the current hype around Data Mesh.
I see another vendor hype cycle wave happening where they #vendorwash their old technology as new, with the promise of delivering a Data Mesh.
To me Data Mesh is as much about organisational design and new ways of working as it is about technology.
I struggle with the language used to describe Data Mesh. I find it too academic and while it has a bunch of good ideas and principles, to me it is lacking in patterns on how to implement them.
We have strived to implement the principles outlined in Data Mesh for decades, I have a hope that new resuable patterns will emerge that will help us to adopt the principles described.
If you want to hear more of my thoughts about reusable patterns and Data Mesh in general them have a listen to the podcast or read the transcript below.
Have a listen to the podcast
Or read the podcast transcript
Scott Hirleman
A written transcript of this episode is provided by Starburst. For more information, you can see the show notes.
Adrian Estala- Starburst
Welcome to Data Mesh Radio, with your host, Scott Hirleman, sponsored by Starburst. This is Adrian Estala, VP of Data Mesh Consulting Services at Starburst and host of Data Mesh TV. Starburst is the leading sponsor for Trino, the open source project, and Zhamak’s Data Mesh book, Delivering Data Driven Value At Scale. To claim your free book, head over to starburst.io.
Scott Hirleman
Data Mesh Radio, a part of the Data as a Product Podcast Network, is a free community resource provided by DataStax. Data Mesh Radio is produced and hosted by Scott Hirleman, a Co-founder of the Data Mesh Learning Community. This podcast is designed to help you get up to speed on a number of Data Mesh related topics, hopefully you find it useful.
Bottom line up front, what you’re going to hear about and learn about in this episode. I interviewed Shane Gibson, Co-founder and Chief Product Officer at AgileData.io as well as an Agile Data coach. Shane’s last eight years have been about taking Agile practices and patterns and applying them to data as an Agile Data coach and those patterns required a lot of tweaks to make them work for data. If you’re taking these patterns that we’ve learned from software engineering and applying them to data, it takes a lot more effort to actually apply them to data. They aren’t one to one. A big learning from that work is that when applying patterns in Agile in general, and specifically in data, each organization, even each team needs to test and tweak/iterate on patterns, and that patterns can start valuable, lose value and then become valuable again. Shane gave the example of daily stand ups to drive collaboration as a forcing function, but then they can lose value and that collaboration becomes a standard team practice. If there is a disruption to the team, a new member or there’s a re-org or something like that where collaboration is no longer a standard practice, daily stand ups could become valuable again. So how do we apply these Agile concepts to data, especially when we think about data practices and patterns as we’re going to do this specific thing going forward?
Currently, Shane sees no real patterns emerging in the Data Mesh space and I kind of agree with that. It’s still pretty early and patterns often take five to eight years to develop and Data Mesh is what? Maybe 12 months into even moderately broad adoption and Data Mesh has such a wide practice area. There are many sub-practices within that where patterns will need to emerge and so that’s going to take time, and there are so many places. So it’s kind of hard that it’s not, “Okay, we’re all focusing on one specific thing”. But that lack of patterns makes it quite hard for even those who want to be on the leading edge of implementing Data Mesh instead of the true bleeding edge because having to invent everything yourself is taxing work. So we need companies to really take existing patterns, iterate on them and then, big emphasis on this, tell the world what worked and what didn’t. People aren’t sharing their patterns. That’s what’s going to make it hard to adopt Data Mesh for many organizations for many years to come and then you won’t be able to… If you’re doing it and you’re not sharing your learnings, then maybe that organization isn’t gonna feel like they’re able to really see the patterns well enough to implement it themselves, and you could have learned from them as well. So it really does kind of snowball if everybody’s helping each other up.
Shane believes that it will likely be pretty hard for many organizations or at least many parts of large organizations to give application developers in the domains the responsibility of creating data products. If your domains aren’t already quite technically capable in building software products, it’s going to be very hard for them to really handle the data needs when you think about Data Mesh. So looking at domains that are using large out of the box enterprise platforms or they’re just kind of using an amalgamation of SaaS solutions instead of rolling their own software, will they really have the capability to manage data as a product if those domains don’t have the most complex of data? Maybe, but if there is kind of complex data that you need out of those domains, are they really mature enough to handle it? I think it’s a very valid question. We don’t really have a lot of information on it just yet.
To really be agile using “Capital A” Agile methodologies, you need to adopt the “Capital A” Agile mindset and not just patterns and practices in Shane’s view. Agile is really about experimenting with a pattern and either iterating to make it better or throwing it up. It’s not about being precious. As mentioned earlier, you should also throw out patterns that were effective and aren’t helping you anymore. You need to do the same at the team and organizational level if you’re going to successfully implement something like Data Mesh. Your teams and your organization overall are like living, changing, evolving organisms. Treat them as such.
A very important point Shane made is Data Mesh isn’t a solution. It needs to at most be a way of approaching your data and analytical challenges of organization, but with a true purpose in mind. The purpose isn’t implementing Data Mesh. Data Mesh isn’t the solution. It’s something to use to get to a goal around your business goals or whatever. The purpose is that the business objective or a challenge in Data Mesh is helping you tackle that. Also, Data Mesh is not the right solution for many organizations, especially smaller ones or ones that don’t have highly complex data needs. Those organizations should review Data Mesh and understand the principles and work towards some of them, but the real challenge isn’t the centralized team being a bottleneck. So don’t take on the pain of decentralizing to be hip and trendy. For those who haven’t really dealt with “Capital A” Agile, a “fun potential learning” per Shane is that there isn’t really a great pattern for measuring if another pattern is working. Proving how well something is working is kind of impossible in a lot of ways. So a large part of it is really feel. We choose this pattern to improve collaboration or whatever. Do we believe our collaboration has improved? If yes, great. Let’s try to iterate and improve it a bit more. If no or our collaboration has even gone down, get rid of it.
For Shane, when evaluating if you are effective in your Agile methodology, ask, does the organization empower this team to work effectively? You will probably need to look at this on a team by team basis and repeatedly ask this question over time. It’s not that we empowered them six months ago so they automatically remain empowered. Trying to scale that “Capital A” Agile to fit all teams in an organization is often an antipattern. If you’re trying to fit the exact same patterns to all of the teams, it’s just not gonna work from what people have seen in the Agile space, and if you are in a hierarchical company, adapting those Agile patterns alone is probably not gonna really change the way you work in the long run. You need to break the hierarchies in some way.
For Shane, there is a big question that Data Mesh has yet to answer, can we really move the data production ownership to the application developers? He thinks if we look at DevOps and how developers took on necessary work for testing in CI/CD, we can, but then even the bigger question is how. How can we map the language of what needs to get done to the software engineering semantics? For Shane, one thing that he really kinda hit on was the idea of a proof of concept or POC is just broken in a lot of organizations. We need to rethink it entirely, especially for Data Mesh. What are you really trying to prove out? He believes that there are typically two types of POCs and most default to type one when potential beneficiaries or the consumers expect the output of type two. In type one POCs, you are trying to prove out a high level hypothesis that has lots of uncertainty. It’s about experimentation and doing it in a quick and dirty way that is not ready for production and the output of type one is all about proving out the hypothesis, not that production ready result. So if you’re doing that with Data Mesh, it might be that you’re trying to prove out the data set that this data set is valuable rather than you’re trying to prove out you can actually do Data Mesh, and that you’re able to build the muscle to actually figure out how would you build data products, how would you build your platform. Type two of POCs is a minimum viable product or a minimum valuable product. What can we strip away from our end goal to get to something that can be used and is mostly productionalisable? Literally, what is the minimum that is viable? It is about approving the capability to deliver and delivering something of value sooner. So ask yourself, what are you really trying to prove in your POC? Is it type one, type two? And communicate that really well.
Shane finished on three points, empower your teams to change the way they work, stop vendor and methodology washing Data Mesh. No vendor, you don’t sell a Data Mesh. Stop saying that. Regarding Data Mesh specifically, share what patterns you are trying to adopt, why you chose them, and what is working and not working. Data Mesh can only evolve to something really great if we work together and share more information. S
o a few key takeaways for me from this episode were Agile methodology is about finding patterns that might work, trying those out and deciding whether you should iterate or toss them out. It’s going to be hard to directly apply those software engineering patterns to the data, but we should look for inspiration from software engineering and then look to tweak those patterns. Any time you look at a pattern that you might want to adopt or evaluate, if a pattern is working for you, ask yourself, will this/does this empower the team to work more effectively? Third point would be applying patterns is a bit of a squishy business. Get comfortable that you won’t be able to exactly measure if something is working, but also have an end goal in mind for adopting a pattern. What are you trying to achieve? And is this pattern likely to help you achieve that? That intentionality, and then the last one again is to share your patterns to not only help others, but to get feedback for yourself and maybe ideas to iterate your pattern further and improve what you’re doing as well. So with that, let’s go ahead and jump to the episode. With that bottom line up front done, let’s jump into the interview.
Super, super excited for today’s episode. I’ve got Shane Gibson here. He’s co-founded a startup, and we’ll talk about that at the end, that’s in the Agile data space in general, and I think can be helpful to a lot of people in Data Mesh, but really what we’re here talking about, he’s here as an Agile Data coach. He’s somebody who’s really seen a lot around data patterns and Agile patterns. So I’d originally asked Shane on because he’s commented in a lot of places on LinkedIn and things, and he always has an interesting perspective, and it’s always one that I think is a pragmatic perspective, and there’s always a little bit of a tinge of he’s seen some S-H-I-T, he’s seen some stuff out there that has shaped his view of why we should do certain things and why we shouldn’t. And so we got to talking and I think there’s a lot of really, really interesting patterns and things that are gonna emerge from this conversation. So Shane, if you don’t mind giving folks a bit of an introduction to yourself and then we can kinda jump into some of the different topics that we were looking at discussing.
Shane Gibson
Yeah, I love to. So hey, thanks for having me on the show. I’ve been listening to it for a long while, so I really appreciate the time you take to get people on to the show to talk about what’s happening in the Data Mesh space. So for me, I’ve been working in the data and analytics world for 30 years now. When I started out, we were back in the day where we called it business intelligence and even executive information systems right at the beginning there. So I’ve had quite a varied series of roles over my life, which I’ve enjoyed. So I started off pretty much in vendor land. I worked for some of the large vendors US based ones, but living in New Zealand 13, 15 years, and my role there was what I call presales typically called sales engineering these days. So the way I articulated it was my role was I worked with the sales guide to convince the customer that the product we had somehow meet the need they had, and getting to the next level of the sales cycle, which typically involved the consultants coming in after us, and then swearing at us about what we promised the product could do and how the hell could they make that work. So we helped customers. We did solve some problems, but there was a little bit of that corporate sales stuff in there.
After that, I spent 10 years where I founded a consulting company, so your typical, I wanna say, consulting company where we’d go in, we’d understand the customer problem, we’d figure out how technology in the data space could help them and then we’d help them implement it. And I really enjoyed that for a while, but as part of that, I stumbled across this thing called Agile. So as the founder, I was lucky enough to experiment within the consulting company with Agile practices ourselves, and then even luckier where some customers gave me the privilege of experimenting with them and their teams to give about how we could take Agile patterns and apply them in the data and analytics space.
So there was lots of content and proven practices around apps or application engineering, but on the data and analytics space, it was always some weird stuff that we couldn’t just apply those patterns out of the box. We had to tweak them given the context of the organization or the data or the teams we were working with. So I’ve spent the last eight years being lucky enough to work with those teams, and then as you said, three years ago, I Co-founded a software as a service data company to combine both the platform and a way of working to give it to see if we could streamline and apply those patterns in a bit of way.
Scott Hirleman
Yeah, I think what we’re seeing with Data Mesh in general is any pattern that came from… We need to learn from what came from the software engineering realm, but we do need to change it and tweak it. People want to copy paste from, “Okay, we’ve been doing this pattern on this other side, so we can just copy paste”. And that’s just not working, right?
Shane Gibson
Yeah. And so there’s some challenges with pattern, right? The first challenge is actually how you describe the pattern because a pattern has value and a certain context. So we can describe the pattern in a technical way, we can describe the pattern in a process way, but what’s really hard is to describe the context of where it has value and where it’s probably risky. And then the second part for me is every team should experiment with the patterns. There is no out of the box data patterns that work for every organization on day one. So the goal is to help teams find a pattern that might solve the problem they have in the context they have and then help them experiment with it to prove it does or it doesn’t and if it doesn’t, throw it away. It’s not a pattern that works for you right now. If it does, cool. Lock it in, keep using it until it loses its value and work on the next problem that you wanna solve.
Scott Hirleman
Do you have any, just off the top of your head, good kind of rubrics or any good ways to measure if this is right for you, or if we just need to tweak it versus what’s baby and what’s bath water, or is it all bath water?
Shane Gibson
Good question. It depends on the team and the context. We will see a pattern provide value for a while and then often we’ll see it lose its value. So I’ll give you an example. In Scrum when we’re doing batchbased delivery or iterations, we talk about the idea of daily standups, right? So it’s a well proven pattern in the Scrum world there. Everybody gets together for 15 minutes every day to talk about what they’re doing and that collaboration has a high value. So if you’re not doing that, that’s a really simple pattern to implement that has value typically. Now, what we’re doing is we’re creating a forcing factor. What we’re saying is normally teams don’t talk enough, they don’t collaborate enough as they work, and that causes us problems. So let’s force you to do that every day for 15 minutes. Let’s give you some muscle memory, but that pattern, yeah, we should be able to move on from that pattern at some stage. So mature teams, you watch them, they collaborate every two minutes, right? They collaborate constantly during the day, and when that happens, we can actually say, “Well, the pattern of a daily standup probably has little value for us now. So let’s throw it away because we’re achieving the goal of constant collaboration.”
Now, what will happen is we may change the way the team’s work, we may change the team’s membership, we may bring other people into the team, and we may have to go back to that forcing pattern again, right? Have daily stand ups to retrain their muscle memory, to get that value back of their daily collaboration, and they lose it again. So a pattern is an interesting thing. It can be applied and unapplied and applied multiple times, but the goal is it’s a solution to a problem given a certain context and by applying it at it has value. So that’s what we look for.
Scott Hirleman
But I just want my one checkbox answer. Why can’t you just give me a checkbox answer that’s too exact, right?
Shane Gibson
No, so that’s easy, right? You look for the vendor that tells you it does if you think you want out of the box, you find one of those shiny-suit consulting companies that have their little roadmap methodology, and you pay them some money and maybe your team is happier, maybe you deliver more value to your customers, maybe you solve your problems. I’m a bit a fan of working with a great bunch of people who craft their own way of working. It’s more fun, more successful. So yeah, I’m not a great fan of out of the box methodologies as you could probably tell.
Scott Hirleman
I think the topic at hand of Data Mesh is Zhamak isn’t, I think a lot of the people in the community are also not for the out of the box methodologies, but we don’t even necessarily have the inklings of the patterns right now. We don’t even have that idea around daily stand ups. It is very, very early days, Zhamak talked about on a call she did that she didn’t wanna have to write her book now. She wished that we had five years of seeing this emerge before she really had to write V1 of the book, but the book is really good for setting your foundational knowledge and understanding. But how do you think about in a world with little patterns or low patterns or just emerging patterns where we don’t know which are good and persistent patterns or long term patterns or all of that? How can we start to think about moving forward?
Shane Gibson
So I think what we should do is we should pick up a pattern that already exists, we iterate it, and then we do the world a kindness by publishing what we learned, and then somebody else will pick that pattern up and iterate it again. And a lot of the things that we talk about in Data Mesh are not new in my view, and I’ll be clear. I’m to an age now where I’m opinionated on many things and Data Mesh is one of those. So the things I like about Data Mesh, I like the core principles, the principles that teams I have worked with for many years have strived to achieve. I like the idea that it’s picked up a bunch of concepts that have been around for years and put them together and said, if we apply these concepts together, there is some value in there, but I think what happens is everybody starts vendor washing it or methodology washing it, and that’s the problem.
So a pattern that’s been around for years is dimensional modeling and why is it being so successful? It’s been successful because Kimball was great at sharing that pattern in many ways, right? Wrote books that we could buy and read and understand and apply those patterns, gave training courses where people could come and learn those patterns with examples. And again, it’s really interesting looking at this world of analytics engineering now where everybody’s going back to dimensional modeling as if it’s a new thing while it’s been around for years. So I think there’s a bunch of patterns that we have lost as kind of the ground truth, and they’re coming back, which is really interesting for me to watch.
Why do we lose them? Well, we lost them because of the big data bollucks. We were in a world in the data space where we were iterating on the ways we work, we were making things slightly better, we were solving some hard problems, and then out of the left field came a piece of technology that had value for certain organizations with certain context, the Googles, the Facebooks of the world, and somehow we wash that technology, that way of working and said, if you’re an enterprise customer with SAP, you can do big data, and it’s like the context was wrong, and what I’m frustrated at the moment is I’m seeing that happen with Data Mesh, I’m seeing the vendor and the methodology washing, the solutions coming to the party, I’m seeing it being tried to be applied to organizations where if you look at the context of that organization, it’s probably not the initial fit. That’s not an ideal organization to experiment with the Data Mesh approach. And yeah, again, one of my questions is what is Data Mesh. Is it a concept, is it an approach, is that a method, is it a bunch of patterns? What actually is it if we had to describe it? So yeah, that’s my view.
Scott Hirleman
I think when you and I were talking about this, I don’t think you’re seeing this from Zhamak, but there’s this assumption that Zhamak is trying to sell it to everybody and that it is for everybody, and the number of times she’s like, “No, please don’t.” This is bleeding edge. I had somebody that was talking about that if you don’t have comfort and ambiguity, if your organization can’t have comfort and ambiguity, there’s way too much ambiguity, there’s way too many places where you’re gonna try and fail, and if you can’t accept that try and fail and learn and iterate model, if you don’t have that kind of Agile mindset in general, Data Mesh isn’t for you right now. It might not ever be, but it’s definitely three, five years too early for you. You have to be in that spot to really be somebody that’s okay with that ambiguity and that trailblazing, and exactly what you said, let’s find these patterns, pick them up, dust them off and see if they were trying to iterate on them.
Shane Gibson
Yes, if I apply some context. Let’s apply some context to the idea of Data Mesh in where you might wanna experiment and where you might not want to. So one of the things we talk about in Data Mesh is socio-technical and so I get the idea that we’re saying this is not just the technology, right? This is an organizational structure. It’s a way of working. I’m not a great fan of the word socio-technical. I’ve seen it in the book, I’ve seen it in team topology. For me, it’s a language that I find cumbersome. It’s not a natural word. It feels weird technical and therefore I’m not a great fan of it, but I love the idea that we have some technology ideas and we have some organizational ideas, and they’re both important. The technology and the ways of working the organization structure are key to adapting this change. So if I take that line and I say, okay, let’s just look at the pure technical side of it, right, if we talk about taking the data skills and we copy what our application development brethren did, our software engineering domain, why can’t we take those data skills and put them back into the software engineering domain, right?
So why don’t we remove this idea of you’re a software engineer. You build a system. You then give me this data exhaust and then my data domain, I’m gonna have to pick up this exhaust and somehow manage it to make it useful. And I really like that idea, but if you’re gonna do that, in my head, the context as you’re an organization is creating your own software. You’re building your own first party data systems. You’re creating your own software that runs your business. And if you’re doing that, hell yeah, experiment with coupling the software engineering and the data work together in a single team, and get those two skills and deliver that, and I think if you experiment that, you’ll get high value. But if you’re a large enterprise organization and you’ve got out of the box corporate enterprise platforms or you’ve got software as a service, you’re not in control of that software development, and so that pattern probably won’t work for you.
It’s a high risk pattern to experiment, but if you’re a large organization that has a decentralized organizational structure and you have a centralized team, then hell yeah, experiment with the pattern around decentralizing your teams, going to domain driven teams for data, syncing about a platform team that provides the platform as a product to those other teams. That organizational structure, that way of work might have value for you. So hell yeah, go and experiment with that in your organization, but don’t think that applying the pattern of having data and software engineers working together is gonna fit you because you’re not developing software. So again, for me, it’s a pattern has value in a context. What’s your context? What problem are you trying to solve? What patterns may help you? Experiment with the pattern, apply it, see if it works. If it does, keep it. If it doesn’t, throw it away.
Scott Hirleman
Yeah, I think the issue that I find in the psychology around data is if you didn’t really, really think through how you were going to implement your enterprise data warehouse, if you didn’t really thread that needle well, it started to fall apart pretty quickly, and it didn’t really deliver a lot of value. This is the thing that a lot of people are worried about and they’ve taken that pattern or maybe pattern is not the right word, but they’ve taken that you had to really hit that nail on the head, you have to thread that needle, and they’ve tried to apply it to everything that is data related, and it’s just not the case. We have to set ourselves up to iterate, and that’s what Data Mesh is about. Being that flexible, scalable, agile, you need the ability to make changes and to test things out, and if they don’t work, throw them in the bin. Throw them out, move on, try the next thing, and I’m just finding that people are still so locked into if I didn’t get this dead on from the start, it caused way too many problems and it’s like, okay, but how do we get people out of that mindset? I haven’t found anything that can kinda snap them out of that mindset.
Shane Gibson
So for me, Agile is a mindset. It’s a way of exploring the way we work and the things we do, experimenting where we have problems and then adapting things that solve those problems. That’s what Agile is. Yeah, a lot of people think Agile is Scrum. Scrum is just one of the patterns. It’s a good pattern, right? I use it with the teams all the time, but it’s just a series of patterns. It’s not the answer. So the other thing is when we’re working with organizations and they say they wanna do an Agile transformation, the first thing I tend to ask them is, “That’s nice, but actually what problem would you wanna solve?” Right. The goal of our work is not to implement Agile. The goal of our work is to deliver value to our customers and we wanna change the way we do that, and Agile has ways that have been proven to help us in certain context. So that’s the answer with Data Mesh. If a customer comes to you and says, “I wanna implement Data Mesh”. You go, “That’s great. Now what problem would you like to solve? And then how do we use some of the Data Mesh things that solve that problem.”
Yeah. Data Mesh is not an answer. It’s a bunch of ideas to solve some of your problems that we see regularly. So yeah, I think you and I are aligned on that idea in terms of if you get told a data product as a Data Mesh product, that’s bollix. I’m also saying if you’ve got consultants who tell you that Data Mesh is a methodology that they can implement, be careful. For me, if they come to you and say, “Data Mesh has probably got some value. What problem do you wanna solve?” And they say how Data Mesh can help solve that problem, then I’m on board, right? That’s a good approach.
Scott Hirleman
Yeah. I’m talking to a lot of smaller companies that wanna do Data Mesh. It’s like is the centralized data team your challenge? If it’s not, if that’s not the bottleneck, if you’re saying, “Oh well, we have really bad quality data and it’s just always causing chaos”, it’s like you can shift that ownership left in a smaller team without decentralizing your data team. That’s okay. That’s not Data Mesh, but many people are still calling it Data Mesh, but it’s not actually what Data Mesh is trying to accomplish, but Data Mesh isn’t designed for that complexity level or that scale. Coming from the distributed systems realm, if you don’t have to distribute your systems, don’t freaking distribute your systems.
Shane Gibson
And if we look, scaling Agile is this problem we have not solved. If I’m working with a team of nine people, they will rock it. If they are given self organization capabilities, if they’ve got the T skills they need, if they understand what the goal is, they’re unable to remove the blockers. A team of five to nine people, they will just rock the work they do, they will deliver the value up the kazoo. As soon as you say, let’s have five of those teams, we now have a scaling problem and in the Agile world, we have not solved that. We’ve got some patterns that help, but we do not have an answer of how you scale from one team to 50 teams well, and that’s the same with Data Mesh, right? It’s there to potentially solve a scaling problem, a decentralization problem. Now, if you’re starting off small, I still say, pick up some of the patterns, pick up some of the principles. They are valuable even with a small team, and they may help you when you decide to scale, but they may not. You may have to change them, but they still have value. So just pick the ones that are valuable for your teams right now.
Yeah, one of the examples is in the data world, we used to have a thing called an ODS, an Operational Data Store. So it was this idea of how do we not do an enterprise data warehouse, how do we not have to click the data and move it into a single place, transform and then combine it in place, how can we provide access to the data closest to this whole system? And the interesting thing is often the ODS didn’t solve our problem, right? Yes, it gave us access to the data next to the source system, but it still gave us the data the way the source system looked, and that the way it looked didn’t help me answer my business questions. One of the ones we’ll see a lot of is what we call party entity. We’ll see an application that has a table that holds customers, suppliers and employees. Why do they do that? They do that because in the old days, that was a really efficient way of storing data.
From a pattern of software engineering, it’s an official way of saying, insert a record here, and we can repeat that, but when you’re using it from a data point of view, I’ve now got to say, well, how do I determine if that record has a customer or supplier or an employee. There’s proven patterns to do that in the enterprise data warehouse space, but with the ODS, we never really pushed a bit, we never gave this idea of, “Give me a list of customers, give me a list of employees.” And so again, from a Data Mesh principle point of view, the idea of doing that, I think, has high value. These patterns for that, they are valuable. So implement those ones if you can.
Scott Hirleman
Yeah. That was what I was gonna ask is if you’re a small team. We’re talking about patterns. Maybe it might even help to define a little bit more of exactly what you mean by pattern, but how do you also think about talking to teams about evaluating what patterns they should even try? Because I think this is where people in Data Mesh think that they either have to go big bang approach and they have to get it all, they have to completely solve federated governance, and they have to completely solve for all their domains, they have to just push all of their domains to start publishing data products and they have to completely solve their data platform up front, and it’s just like, now we can iterate, iterate, iterate. But how do you talk to a team about evaluating a pattern and then whether they should try it or not, and then also evaluating whether it’s working?
Shane Gibson
The interesting thing in the data space is most people come from an engineering background, they come from a problem solving mindset. That’s kind of why they get into the gig. So I don’t tend to have to coach them on that behavior. They’re used to being given crap data and problem solving it to make it useful. So if we articulate that to them in the way they work and say, “If you’ve got a crap process or a crap practice, then how are you gonna fix it because it’s just a pattern thing?” So as long as we can help them find some patterns, and give them permission to experiment with them, then they will naturally do that. How do we know if it’s working or not? That’s the one that I haven’t codified. I don’t have a pattern for how we do that. I talk often about unconscious and conscious behavior. When we can do something unconsciously that has value, it’s based on our skills and our experience. We can then take that unconsciously, teach somebody else, or coach somebody else to apply that, then it has more value. So for me right now, proving that the pattern is adding value is an unconscious behavior. The team now it’s working. They can see the bottleneck disappear. It feels like more fun, but I can’t quantify it.
Scott Hirleman
Yeah. I think that’s what is the value of having this data? It’s like, “Well, we could do this one thing.” “Well, what’s the value of doing that one thing?” “We don’t know because we haven’t done it yet.” But we think that the risk reward is there, but it gets pretty squishy. When you start to talk about ROI on data, people are like, “Well, but I want to know what is this going to return?” And some of it’s like, we’re making a bet. You have to make that bet and then see if it’s working out. You have to trust that human intuition versus pure numbers as to is this working.
Shane Gibson
Yeah. And I differentiate our technical practices from our ways of working. Again, if I go back to Agile, I go back to Scrum, and Scrum is a forcing function for helping teams to iterate the way they work, and it’s called a retrospective, right? Retrospectives run world is where the team sat down and said, “This is what we did in the last iteration, yeah, three weeks. This is what worked, and we’ll keep doing it. This is what didn’t work, so we need to change something, and here’s the top three things we are going to experiment with, to apply to see if we unblock some of those things that aren’t working for us.” So that retrospective process is a pattern, it’s a forcing function to help teams change the way they work and experiment with that. In the data space, we can pick up some of the minimum viable product or the proof of capability process out of the software engineering world, the lean startup stuff. We can say, “Okay, we don’t wanna boil the ocean. We don’t wanna go and build out a canonical model for the whole organization for nine months with a bunch of beauty witties sitting in a white board on a little cupboard and not talking to any customer.” Anybody with subject matter expertise who then comes out with a massive enterprise relationship diagram that nobody understands and then wonders why it doesn’t get implemented.
We can change it to say, how do we define an information product, how do we define a small set of values, how do we then break our work down into a small iteration that does some of that work and see if it has value? And if it does, let’s build it out more. If it doesn’t, let’s throw it away. What’s important is the context of the organization. Does the organization empower and allow their teams to work their way or not? And if they don’t, then you’ve got some other problems to solve first before you actually have the team in a safe place where they can experiment. Again, I go for startups, tech driven companies based on experimentation, organizations that were founded 50 years ago, and based on hierarchies. Experimentation is not a new corporate DNA. It’s a massive change for them and we have to be aware of their context.
Scott Hirleman
Yeah. I’ve been talking with a few people about this concept of what do you need to do, what is your maturity level or what’s your readiness assessment for implementing Data Mesh, and there are some vendors that are putting out readiness assessments, but I would guess that all of them are like, “Yes, you’re ready for Data Mesh.” Or, “Here’s the flavor of Data Mesh that you should use.” Versus if you don’t have that kind of Agile capability, that Agile mindset, is Data Mesh really for you right now or even ever? And if the answer is no, that’s okay, that’s fine, right?
I had Daniel Engbergfrom Scandinavian Airlines on the podcast, and we were talking about in a traditional hierarchical company, how difficult it is even to create lasting cross functional teams. He said when the pandemic first hit and flights all got grounded, they created this cross functional team that got something done in six days that would have normally taken four months, and he’s like, “How do I reintroduce that juice?” But also that you don’t constantly just go for the cross functional, cross functional, cross functional. What’s the most impactful thing that these people can be doing right now if you’re not working towards their career, if they’re not gonna have a good career trajectory in a good management structure because if they’re just working on the problems at hand and you don’t have somebody that’s like, “Okay, you’re a data engineer. We’re gonna work with you to have a good career progression for data engineering.” Those people are going to leave sooner rather than later, so they don’t stick there for five years, and then they don’t have that good career trajectory. So it’s always a balancing act, but I’m hearing exactly what you’re saying, and I would love if we had a great way of changing those large organizations. Maybe just what I’m hearing is you really, really, really need all the high ends really bought up. If you’re in a hierarchical structure, the people who make the decisions are the top level, and you need them all really bought in that that change needs to happen and that you need lasting momentum towards it. Do you have anything that you’ve seen that helps there?
Shane Gibson
So ideally that’s true, but factually, it’s not. So I do an Agile Podcast which is around Agile, not so much around the data space, and we’ve been lucky enough to have some guests on that. I’ve been working in the Agile space for quite a while, and there’s some patterns coming alright. So as we talk to people, I start to try and identify the patterns in my head. In my head now, there are patterns of organizations that started before 1990 that are very hierarchical based and they are incredibly difficult to change because it’s Conway’s Law, right? The organizational structure is so embedded and one of the guest had this really nice way of saying it. They said that if you don’t actually disrupt or rub out some of the organizational structure. It doesn’t matter what you do, the corporate memory will rebound to replace the structure the way it was. So you have to do fundamental forcing factors somewhere to break something to get that change to happen.
And your airline example is a classic one for me. A crisis caused them to remove all the constraints and organizational structures and allow their teams to just get on and do the work with no boundaries is what I’m guessing and so the team stood. Now, the question is, how do they then adopt their pattern on the organization on an ongoing basis? So if I look at that, I’d say if you’re not one of those companies, don’t try, but then I’ve spent eight years working with enterprises and teams and enterprises and seen that work. So we haven’t changed the organization, right? What happens is as the data analytics team starts to scale out to the rest of the organization, they hit the organizational barriers, right? They hit the problem.
One of the patterns that’s really important in that scenario is what I call the S-H-I-T umbrella. If I’m working with a data analytics team in a large enterprise organization, we need a pattern where there is a senior person just above that team that is holding the rest of the organization back. They are running the interference, so when somebody goes, “Yeah, you’ve gotta go comply with those corporate governance and go through this architectural review”, they work with a team to say, “Actually, we’re gonna do that, but we’re not gonna do that right now. The team needs three months breathing space to get their patterns in place, and then they’ll present them to the architecture form. We’re not gonna get anything in production. We’re experimenting, but we need to bring well formed content.”
And then sometimes what will work as we also say, if there’s somebody in the architecture space that wants to be involved with the team to provide some coaching and mentoring and some kind of federated governance, you bring them in, and we’ll work with them and, yeah, do that and that helps. So that pattern of data and analytics teams applying agile ways of working in large corporations is being successful, but we do hurt. We had some problems, and one of the problems we had is we don’t have influence over those source systems where we’re collecting data from and so what happens? Those engineering teams or those vendors change data structures, and we have to adapt at the last minute and be caught up there. So again, one of the principles I like about Data Mesh is the merging of those two skills or teams, and if we can make that work, it’d be of high value for that. So another example is if we’re working in a batch or an iteration based and we’re looking for self organizing teams and crosscore T skills and removing any barriers for that team to be successful, we end up blending the traditional business analyst role and the engineer role and the data architect role, and the platform admin architect role, engineer role to make sure that the team have all those skills and they’re one team.
So if we take that pattern and we apply it to the Data Mesh principle, then what we need to take is all those data skills and move them into our application engineering teams. We’re not putting a data engineer in an application engineering team. We’re finding ways of making application engineers do the data work that we used to do for them. Now that’s hard, right? It’s not a place they’re all being lucky enough to work with a customer and experiment with. Love to, but yeah, I have no patterns on how you make that one work.
Scott Hirleman
One that I’m seeing in a lot of places is that, yes, it would be great to have the most complex complicated data products, but a lot of this is muscle building, and so NAV has talked about this. Sheetal Pratik talked about this kind of her general philosophy of building a lot of muscle that you don’t go for super, super complicated data products, so you can bring those application developers up to speed sooner because you’re not making them learn very, very complex thing where there is this very complicated way of transforming the data and really putting it into a very, very refined format, and so they don’t need quite as much data skills for the early data products or the early iterations of data products. Now it might mean that you don’t have, again, that same return on each data product, but you have a much lower investment in each data product. So is your time to return much lower? Is your return on investment higher? I don’t know how that evolves in the long run. Can we get those application developers to a place where they can do all of that data work that you’re talking about? I don’t think so, but I think Max Schultze at Zalando talked that they kinda had a similar model, and there were domains that started to hire their own data engineers because they had complicated enough challenges to need that, but that it wasn’t de facto. And I think, to me, that’s the one that makes the most sense of if you have specialty needs, you should go grab those, you should go find those and put those into the team, but in a lot of cases, you don’t need to over complicate it that much. Is that actually a pattern that will stick? I have zero idea. It’s very early, extremely early days.
Zhamak talked about in general, you want five to eight years. I think you’ve said the same thing for how long it takes to develop a pattern. Zhamak’s article came out less than three years ago. So not only has the article about the theory of doing this not been out for five years, but people haven’t really been doing it until maybe early 2021 was when we started to see some people really saying, “We’re gonna start on our journey.” So we’ve got another four to seven years before reusable patterns really emerge. And how do we get people to head down those paths to figure out what’s going to work for them even if they don’t have a broad, broad, applicable pattern. I have zero freaking idea. I’m trying to ask you to just be like, can you solve all the Data Mesh problems? This is a little bit much of a question.
Shane Gibson
Yeah. So let’s unpack some of the 12 things that you bundled and their problems. So we have seen this before, right? We’ve seen this pattern repeated. So in the app development world, we used to have a pattern where application developers write code and testers tested it and we changed that. We bought testing skills into our development skills. Yes, we built some technologies, we bought in TDD and BDD and ATDD and we bought some automation, but what we did is we said, our application developers, with the right tools and training and language, can pick out the testing skills and do that work at the same time.
So why cannot we do that with data? Now, one of the reasons that we can’t at the moment is language. Whenever you go into a domain, whether it’s a technical domain or a lawyer domain or a medical domain, we get new language and often that language is through literate numbers that’s complex, and so what do we do in data? We talk about a dimension, we talk about it slowly changing to type two dimension, we talk about a whole lot of technical words that are hard to understand. Once we changed that language, once time we talked about the idea of a concept. What do I mean by concept? Well, in an organization, we have a bunch of concepts, we have a concept of customer, we have a concept of product, we have a concept of order. What’s the questions we typically get asked on day one? How many customers we’ve got, what products did they buy, how many orders did we receive? And we underestimate the value of that data. We assume that the organization can actually answer those questions and you go and look at them and actually nine times out of ten, they can’t, or they definitely can’t do it repeatable.
So we should change our language, we should focus on it. And actually, one of my frustrations with Data Mesh, one of the things that I do not like about it currently is the language that is used is hard for me to read and understand for some reason. It just doesn’t gel with me personally. I read the words and I can kinda see some patterns and some terminology that I might be able to map to, but the words that are used, they’re great with me. I can’t align for some reason, and maybe it’s me, but I think there are some other people out there. So if we can change the language or we can make the language more accessible and we can map that language to what software engineers do, then we may be able to give them those skills.
Again, like I said, I think we should share patterns and we talked about this before but there is a pattern called BEAM the business event analysis and modeling. It is written by a guy called Laurent Corr who’s published a book on it. It’s a pattern that I’ve used for eight years. It’s probably the pattern I’ve had the most success for where that is not a technology pattern and it really is simple because you talk to a stakeholder and you say, who does what.
“Who’s involved in your core business processes?” “Customers.” “What do they do?” “They buy products.” “Cool. When they buy a product, what’s the term they use for that? What are they doing? Are they ordering a product?” “Right.” “What else do they do?” “Our customers pay for our products.” “Okay, do they?” “Actually no, customers pay for orders.” “Okay, there we go. Customer pays for orders. What else happens?” “Store ship orders.” “Do they?” “Actually no, they ship products, ’cause we might do partial shipments.” “Cool. We got some complexity in there. What happens next?” “Customer returns order.” “Do they?” “No, customer returns the product.” “Cool.” So now what we know is we’ve got a whole lot of problems to solve in the data world. What happens when an order is partially shipped and then a product is returned, but the payment is for the full order. We can start seeing that, but we also now have a shared language, so we can say to the application developers, “Can you give me an API where I can get a list of customers?” “Yes.” “Cool. Can you give me an API where I get accounts of customers.” “Yes.” “Great. Can you give me an API that gives me that cool business process? Give me all the products that were ordered, give me all the orders that were paid.” And so we can start having a shared language and so that goes to the other thing.
So to come back to that, what we’re asking for is not to be given data exhaust, not being given access to the data the way the application is being written because it makes sense for the application. We’ve been asking to be given access to data as a product, something that has a shared language that we can all look at and nod and go, “I understand what you’re talking about.” And so now we can do the rest of the hard work, and if I break it back to one of the other points you had in there is that we can then actually just create a data product or an information product, as I call it, that is how many customers have we caught, right? Do that first. If you’re starting from scratch, answer that one question. Now, not because answering that question is hard, but the muscle memory, the way of working all the processes and patterns and practices that you have to put in place to give your stakeholder account of customers where the value is. Now the trick there is, again, we go back to that umbrella.
You need somebody in your organization that’s gonna give you permission to spend some time on iterating the way you work where the one thing they see coming out of it is… Well, they’ll see two things actually. They’ll see a physical counter customer and they’ll see a team that’s actually enjoying and iterating the way they work, and that’s where the value is, right? But we need permission to be able to do this small piece and that’s hard because everybody is going… A number of times we’ll go, “We want to implement Scrum because we wanna deliver more faster.” And then the conversation should be, “But you know it’s gonna take at least three months before the team is up to what we call velocity if we’re lucky. So are you okay that these three months of experimenting before you see the work getting done?” And worse, the stuff that they used to do is not gonna get done anymore. We’re gonna make it worse for a while.
Scott Hirleman
Slow down to speed up, slow down to speed. I do actually wanna go back to the thing you said about the language around Data Mesh. When I was first learning about Data Mesh, I was struggling, and that’s why I recommend people watch a few of Zhamak’s old videos because I think it goes through the exact thought process because the written stuff is in a way very, very technically sound, but it’s not in a language that, as somebody who is kind of new to the data space, I didn’t really understand a lot of it. I can go back and every readthrough, I get incremental data points, but it’s even tough for me to absorb all of it because it’s so densely packed with information. I’ve never been able to read through one of the two posts on the Martin Fowler site, in one readthrough. It’s too much dense information that’s coming my way.
That’s why I’m doing the podcast. That’s why I’m looking to reboot the meet ups and stuff, is that we do need to make this approachable, and one thing that I tell everybody and your central point that I think that you’ve been talking about is do stuff with intentionality and with purpose. Don’t do it because you wanna do it. Don’t do Data Mesh for the sake of Data Mesh. I literally tell people, put in your documentation control, find, replace Data Mesh with unicorn farts and people are like, “That’s so stupid.” It’s like, no, because you know that you won’t ship that to anybody outside of the data organization if you do that because you know that you will take out the phrase Data Mesh, ’cause it doesn’t matter how you are doing this as long as it is repeatable and scalable, and that it’s got sound foundations. It doesn’t matter if you end up going Data Mesh or no Data Mesh if it can solve what you’re trying to solve, and I think that’s a central point that you’ve been making throughout a lot of this if I’m summing you up in a lot of ways. If you wanna add anything there, I did have one thing that I really wanted to hit on as well which was your view of proof of concept, but I would love to if you’ve got anything to kind of respond to that.
Shane Gibson
So I have a question. The question that I can’t answer is why has Data Mesh got market share in terms of noise and brand and DataOps never did? So for me, a lot of the patterns of Data Mesh are the same patterns we look for in DataOps. DataOps was the idea of taking DevOps and the whole of the agile software engineering practices and applying it to data. I’m really interested in somehow the Data Mesh with the brand and the concept has taken off like the data did. So for me, I’m with you, unicorn facts all the way, but we’re gonna see Data Mesh for the next couple of years, right? We’re gonna see some people do some bad things and we already started to see that.
What I’m asking is that for the people that are out there that are already applying those patterns and practices and one thing I love about your podcast is you’re bringing people who wanna go, “Hey, we’ve been doing Data Mesh for years. We struggled and here’s what we’ve done.” And you look at it and you go, “Yeah, you’re applying a principle of self service, you’re applying a principle of domain or subject area, eating the elephant, double the ocean.” You’re applying the principle in some ways of decentralizing teams, but still providing a right level of governance. So if everybody shares their patterns and practices and calls in unicorn facts that’ll be great. If they call it Data Mesh, that’s fine, but share the how, not the why, right? How did you do it? How can I take that and what was the context and help me implement that with the next customer I’m working with? That’s where the value to me is.
Scott Hirleman
And what did you try that didn’t work? Like the antipatterns, I think are almost more important right now than even the patterns emerging of, “We tried this and it didn’t work.” But yeah, and I can tell that you’re a Catalog and Cocktails listener, ’cause of how often you say, boil the ocean, I think. Juan says that like all the time too.
Shane Gibson
Yeah, and my other favorite one, “All roads lead to Austin.”
Scott Hirleman
So you’ve got an interesting idea around proof of concept, so people who are trying to get started with Data Mesh. Had Paul Andrew on a little bit ago and he was talking about why he thinks that proof of concepts are BS as well, but I would love to hear your thoughts, and then I can kinda intertwine his thoughts and where my thoughts have evolved on this.
Shane Gibson
Yeah, so I’m not a great fan of proof of concepts as a term anymore. I think we have done bad things to that language. So for me, I think there are two core contexts where we wanna do some proof. One is we have a high level of uncertainty, we have some hypotheses, some theories of what might work, but we know that we really can’t find a lot of patterns in the same context that we have. And so before we invest in doubling down and trying to build that into the way the work, has value if we do a little bit of work upfront to prove that our hypothesis is right or wrong, and that’s an experimentation framework. So what we should do is we should be very clear about what our hypothesis is, we should be very clear about the steps we think we’re gonna take to prove or disprove the hypothesis, and we should be very clear about how we’re gonna measure whether it was successful or not, whether we’re gonna carry on this thing.
So I’ll give you an example. I have a hypothesis that I can take the transformation rules from an enterprise data warehouse, and with modern technology, I can encapsulate those as virtual code, and I can apply that on streaming data, and in certain context, I think I can provide a data warehouse like access to that data and near real time, right? Now, that is a high risk hypothesis. There’s a whole lot of technical moving parts that I have not proven will actually work together. So if I was gonna double down on that as a product or a customer, I would want to spend a small amount of time doing that, and then the Agile terms, we actually call that a research spike and my startup, we call it a “mcspiky,” but what we do is we say we had this amount of time and this amount of money or people to prove or disprove the hypothesis as much as we can, and at the end of that, we will decide whether we’re gonna carry on with it or not. So that’s one context. The second context is the idea of minimum viable or minimum valuable. So what we say is look, we could do a whole lot of work for six months and deliver this thing of beauty, but we’re not sure that actually has value. So again, what can we do to break our work down into smaller chunks to prove the value earlier?
And so let’s do that and that’s a minimum viable. Now, minimum viable doesn’t mean it’s crap, it doesn’t mean it’s play. It is production, it is usable. We’re just reducing the things we’re delivering to reduce the amount of time it takes before we get in front of a customer to see if it has value, and sometimes we can use that context, that pattern of minimum viable for lots of things. So one of the ones that I see a lot of enterprises do is we have this idea of a lifetime value. So a lifetime value is the idea that you have customers that come to you and it costs you money to find that customer and on board that customer to your company, and then that customer pays for things over time, right? So their lifetime value goes up and we wanna determine if some customers give us a bit of lifetime value than other customers, and therefore they’re the ones we should go after, but if you think about that and you look at the way we build that out, it’s based on a bunch of patterns.
So we have to understand what our revenue from our customers are and we have to understand what our cost for our customers, we have to understand what our profitability or margin for each customer is, we have to understand how long the customer is likely to stay with us, their behavior based on potentially other customers. Each one of those little moving parts, those Lego blocks has value. So if we go in and boil the ocean and say, “I’m gonna give you a lifetime value, but it’s gonna take me two years. Go away. In two years, come back to me, and I’ll give you the answer.” We tend not to get funded. It’s too long. The organization has changed, but if we say we’re gonna break that puppy down, right, and so we’re gonna go and really quickly do minimum viable product on revenue per customer, and you can use that information. And once you’re happy with that, yeah, it’s good enough, we’ll go and do cost per customer, and then we’re gonna go and give you margin per customer. If we do those into little Lego blocks, we have more success, we get more feedback, we change the… Once we’ve done revenue, we’ve laid lots and we apply that learning and that knowledge to doing cost. We iterate the way we work. We get a bit of a feedback loop. That’s what we’re looking for.
So proof of concept, we’ve done bad things to it because what we talk about now is I’m gonna do really ad hoc process to get something dodgy out of the door really quickly to prove the concept that I can deliver something, and then nine times out of ten, the stakeholder will turn around and go, “Cool, you’ve given me the counter customers. Thank you. What’s next?” And we go, “No, no, hold on. The amount of technical data in that thing. That was just a proof of concept. We’re gonna spend six months to make it right.” And the stakeholder goes, “Also the numbers aren’t right.” Although the numbers are kinda right, we haven’t really tested them, but yeah, it was just a proof of concept. I don’t like it anymore. So for me, experimentation, we have something that’s high risk and we wanna de risk it, or chunking it down into smaller chunks that are production really, they’re just minimum viable products. That for me is patterns that work.
Scott Hirleman
Talking about minimal viable Mesh, that was the thing that Paul was talking about, was that if you’re trying to do a proof of concept in Data Mesh, what a lot of people are doing is a proof of concept around the data set, not even as a data product, not even can we put out a product. It’s can we hit that dataset? Exactly what you talked about, this is the answer to the question as of today, but can we reproduce it, can we scale it, can we actually treat this like a product? And so are you putting out a proof of concept around actually putting a single data set together or a couple of data sets that you’re combining together, or are you doing a proof of concept that you’re actually able to put out data products and so that you have something that is repeatable? That is kind of what you talked about, like scaling that down in that second type of proof of concept or are you having a minimum viable Mesh?
Not just that you’ve got data products, but that you’ve got something that is minimum viable for actually creating additional data products, and the proof of concept around a data set might be three, six weeks. The proof of concept around data products might be eight to twelve weeks and the proof around that minimum viable Mesh might be twelve to eighteen weeks, and is it valuable to do that first type of getting to a proof of concept that you can have that data set when it’s just gonna end up like all your other data assets of these orphaned things that are really, really hard to keep maintaining and that aren’t really treated like a product and all of that? So it’s funny because when you and I first talked about doing this episode and I was like, oh yeah, yeah, I just keep using the phrase proof of concept, but I think it’s getting started strategy in a minimum viable, whatever you wanna call that “P”. Is that product, is that platform, is that minimum viable? Whatever, I think that is a better approach because so many people are going to set themselves up to go down these bad paths by trying to get to a data set to answer a question, but that you can’t actually bring that reuse.
Not just that you can reuse that same data and other data products, but then you can actually go back. That is a well spring that you didn’t go and take all the water. You can go back and fill up your water bucket because that information is able to continuously flow. That’s kind of what I’ve been thinking is that I would love to get your reaction if I’m just talking crazy sauce or what.
Shane Gibson
So the underlying pattern of chunk your work down to smaller bits of work that get value out quicker and we get a feedback loop so we can change the way we work next time, that is where the value is. The idea of a proof of concept for Mesh, I’m gonna call bollix on that, ’cause Mesh ain’t a thing. It’s a bunch of ideas and principles and patterns that have value if you put them together.
What would I do? I would probably pick up one of the patterns out of somebody like Amazon. There’s a bunch of patterns for describing the value you’re gonna deliver early without doing the work, right? So there’s an idea of a press release where we write a one page, which is the press release, the sales document of what we delivered at the end of our experimentation interval change that we’re working on. So I’d do that. So if I think about what you said, there was a bunch of value statements in there. Are we experimenting on the way we’re working to see if we can grab some data from a source system and do a thin slice all the way through to something where we give that information to a stakeholder and it has value? Is that what we’re proving? Is that what we’re working on and we’re trying to iterate on there? So write that down, right? Write that down as a pre statement. Our proof is that we can take a piece of data from one of our data exhausts and provide information to a stakeholder in a way that has value, yeah, and they love it, and then go and improve that, and then bake that process on.
You’ve done a small slice, you’ve done a small piece of work, a small bunch of patterns that you’re gonna use in the future as our proof that the thing we wanna experiment with, that we can decentralize our teams. Instead of having a centralized data team that does all this work, we can break them up into smaller pod squads, whatever you wanna call it, and decentralize them outside of that one team. Now, it doesn’t have to be the software engineering team. It could be out into finance and then another’s out into HR. So let’s write the press release. We used to have centralized teams and everything took six months and nobody knew who to talk to. We’ve changed the way we work to decentralize and now we have those data people sitting next to the domain expert, the subject matter expert, and now we have quicker feedback loops and quicker delivery, and that’s what we’re aiming for. And now how you’re gonna experiment the way it works? What patterns are you gonna use to see if you have achieved that goal? So for me, that has value, but that’s not a proof of concept. A proof of concept in my head, the language now means a vendor or team is gonna come and do something really quick and really dodgy that’s not sustainable to prove they can get the money to carry on, and I’m not a fan of that anymore.
Scott Hirleman
Yeah. And I think it’s one of those things again about what language you use versus what intentionality, and that’s kind of the semantics of, I wrote it down, what are you really trying to prove. If you’re doing a proof of concept, what are you actually trying to prove? Are you trying to prove that we have the capabilities to actually move forward with implementing Data Mesh or not? Is that what you’re trying to prove or are you trying to prove out, oh, we can find places where there’s value if we created more easily accessible and usable data? Of course, there is. So what are you really trying to prove? And what’s that output?
Shane Gibson
And I really love that. So for me, I use it to improve the capability now, and I know it’s a semantic difference between proof of concept, and I’ve just finished working with a customer where we use that term and just like you said, what we talk about is what capability we’re trying to prove we can build and how do we prove we have built it, right? And if we can document those two things at the beginning, then doing that work has value. We now have a framework of what are we aiming for and how do we know we achieved it, how do we know that capability? So yeah, and then we should also bake in some form of maturity, right? At what level of maturity is that capability? Because we don’t wanna overinvest in it. We may have a light version of that capability, and then we may iterate and bake in a higher level of maturity that capability in the future, and that’s okay. We can reiterate on those capabilities. We should, we should pick them up and make them better when they cause us a problem. So yeah, proof of capability, I’m on board for that one.
Scott Hirleman
Yeah, I agree. I think that’s a good framing for it. So we’ve covered a lot of different things in this conversation in the last hour, but is there any kind of way that you would kind of sum it up or any kind of parting advice that you would give to folks and say, this is what you’ve gotta do, whether you’re thinking about Data Mesh or even just agile and applying that to data in general.
Shane Gibson
Yeah, so there’s a few takeaways for me that I think depend on and I’ve seen that are successful. So first pattern is empower your teams, give them the ability to change the way they work, support them in their power, and let them get on with their job. People are good people. If you give them the bandwidth to do what is right, they will go and do good things. The second thing is I wish we’d stop vendor washing the methodology, washing Data Mesh. I don’t think it’s gonna happen. I think we’ve got another wave of big data bollix coming, which is sad. So maybe all we can do is every time the vendor tells us they’ve got a Data Mesh pattern is we ask them to show us the pattern. Was it Tom Cruise and Moneyball? Yeah, show me the money? Why don’t we just start yelling, show me your patterns and if they can articulate a pattern, it makes sense then how much substance is under the covers. And last one, we should share patterns. If we have something that works in a certain context, it helps the data domain. If we can share those in a way that somebody else can pick them up and experiment with them, we can accelerate the ability to do good things with data. So for me, if we can achieve those three things in the next 12 months, I’ll be a happy man.
Scott Hirleman
Yeah. I talked about when I first founded the community stuff. I was like, “I think in 18 months, we’ll have all this information about the patterns and all of that.” And we’re a year and a couple of months in, and I’m like, “24 months from now.” So is it like another 12 months from now? That’ll be 36 months from now, we’ll have the pattern, but yeah, I think we really owe it to each other to share information about what’s working. I think that’s really valuable and helpful and what’s not working, right?
Be okay that hey, you went down a path that didn’t work. That’s totally okay. This stuff is bleeding edge. Somebody’s gonna get cut. You saying, “Oh, I got cut, but I ended up in a better place”, great. That’s a great story to tell and people will highly appreciate you for telling it. So I did wanna give you some space as well to talk a little bit about what the company does and if people wanna follow up with you, whether about Agile Data, about the company, all of that, would just love to hear what you want people following up with you about and include it into that kind of what the company is doing and what you’d like to have people kinda talk to you about.
Shane Gibson
Yeah, so from a startup point of view, AgileData.io, we’re building out a platform in a way of working that reduces the complexity of data in a simply metrical way. The way I think about it is our goal in life is to remove the role of a data engineer for 80% of all the data problems in the world. We’ve seen that pattern before. The example I use is WordPress. A lot of websites, over 50% of the websites in the world are based on WordPress and they have made their complex creation of websites simple. Now, not everybody uses WordPress, right? There’s still a need for software engineers that build gorgeous, beautiful, unique websites, but not everybody wants one of those and can afford the cost or the time to do it.
So that’s kinda what we’re aiming at, but from a Data Mesh point of view, I think the call that I have is we have a website called WoW. So https://wow.agiledata.io is where I try to share the patterns that I have seen other teams do in the data space. It’s one of those things that you never have enough time to write up the patterns. Ask everybody else to do it, and then you try and make time for yourself. So the call out for me is there’s a form on the front of that page, and it basically says, “Ask me a question.” And the reason it does that is what I’ve found is if there is actually somebody I’m talking to who has a problem that I can write a pattern up or a series of patterns that may help, then I tend to do it. So yeah, reach out, go into that website, type in the problem you want solved, and then if I can, I’ll write up some patterns on there and send it back to you and say, “Here’s some stuff that I’ve seen other teams try that worked, here’s some anti-patterns. Fill your boots and let me know how you go.” So yeah, help me share more patterns. Yeah.
Scott Hirleman
And I’ll drop links to that in the show notes, and then another good way, I think, you had said as well, is LinkedIn is a good way to reach out as well.
Shane Gibson
LinkedIn and Twitter, yeah. I tend to be active on there and see me positive/negative, partially sarcastic, sometimes grumpy way. So yeah, yeah, if you wanna talk to me, reach out to me on one of those two things, and I will respond.
Scott Hirleman
I think that’s a good way of describing my own social media presence. I wish I were the person that’s just always very, very positive and happy, but it’s like sometimes you just gotta say, “What are you talking about? No, go to the corner. You’re on time out.”
Shane Gibson
Yeah. I gotta say I do like the fact that you hold people to account, and I do like the fact you do it in such a polite way. So big ups to you for both of those things.
Scott Hirleman
Thank you. I appreciate that. Well, Shane, this has been really, really awesome, really helpful. I think a lot of this stuff is gonna help people really start to think about how they should be approaching data projects in general, and the mindset methodology. And so much of what data’s been, it’s been trying to throw technology at the problems and trying to overly lock on this. We have to get it all right up front or it’s never gonna be valuable, it’s never gonna work versus the iterative process. So I think getting people comfortable with that is a great mission in general, and I think you’ve provided a lot of useful information. So I wanna thank you for the time today and also thank you everyone out there for listening.
Shane Gibson
It was very fun and we’ll catch you later.
Scott Hirleman
I’d again like to thank my guest today, Shane Gibson, who’s a co-founder and the Chief Product Officer at AgileData.io as well as an Agile Data coach. As per usual, you can find his contact information as well as the links he had mentioned in the show notes. Thank you.
Thanks everyone for listening to another great guest on the Data Mesh Learning Podcast. Thanks again to our sponsors, especially DataStax, who actually pays for me fulltime to help out the Data Mesh community, if you’re looking for a scalable, extremely cost efficient multi data center, multi cloud database offering and/or an easy to scale data streaming offering, check DataStax out, there’s a link in the show notes. If you wanna get in touch with me, there’s links in the show notes to go ahead and reach out. I would love to hear more about what you’re doing with Data Mesh and how I can be helpful. So please do reach out and let me know as well as if you’d like to be a guest, check out the show notes for more information. Thanks so much.
AgileData is focussed on reducing data complexity for Analysts not Software Engineers.
While we strive to adopt principles that are similar to those described as part of Data Mesh:
- Information as a Product;
- Domain Driven Data Design;
- Self Service Data Platform;
- Federated Governance.
If we were building a capability designed to enable Software Engineers not Analysts, we would be building a markedly different product.
Keep making data simply magical