DataOps Patterns with Chris Bergh
Guests
Join Shane Gibson as he chats with Chris Bergh on improving your teams way of working by using DataOps patterns.
Listen on your favourite Podcast Platform
| Apple Podcast | Spotify | YouTube | Amazon Audible | TuneIn | iHeartRadio | PlayerFM | Listen Notes | Podchaser | Deezer | Podcast Addict |
Podcast Transcript
Read along you will
Shane: Welcome to the Agile Data Podcast. I’m Shane Gibson.
Chris: And I’m Chris Berg from Data Kitchen. Glad to be here.
Shane: Hey Chris, thanks for coming on the show. Hey, today I want to talk about DataOps, you know, DataOps is a term that’s been around for a while. And luckily enough, you were happy to come on the show and talk about the term DataOps because you wrote. white paper, which I would have called a book because it was looking at now 133 pages around recipes for data ops.
Great paper around what it is and more importantly, things you can do to implement it. So before we rip into that subject, do you want to give the audience a bit of a background about yourself?
Chris: Yeah. So I’m Chris. I’m a technical person started my career in NASA and MIT writing code writing algorithms. Then about 20 years ago, I started to get into the world of data. And I managed a team that did kind of data and analytics for the healthcare industry. What we now call data engineers, we called ATL engineers, data scientists, and, the team I managed, it was, I thought it would actually be pretty easy to do that because I was a big software guy and it’s not a problem to run a data team. What I really found out, it was not easy at all. Things were breaking left and right. We never could satisfy our customers. We couldn’t go fast enough. And how do you actually make your customer successful and be customer focused while sort of running a factory of data and all those things about innovating, focusing on customer success and allowing people freedom to your team to be both productive and an innovator at the same time. And that’s the kind of challenge that I faced. And . When we started this company a decade ago, that’s the sort of principles at which I thought every data team is suffering from. And I think they, to a large extent, they still are suffering from.
Shane: think what interests me is, , as you say, , you started your company 10 years ago to work in this space. I think you published the cookbook in 2019. So while some people think DataOps is a new buzzword, it’s a term that’s been around for a while and it’s to set patterns that have been around for a long, long time, especially outside the data domain.
As you said, there’s a bunch of patterns from the software domain that we can adopt as data teams, if they have value. So let’s just do a little bit of anchoring. When you say the word DataOps, how would you briefly describe it to somebody? If you’re describing what you mean by that term.
Chris: It’s a set of technical practices and management paradigms for data and analytic teams to drive customer success and be more productive.
Shane: Okay. And then for me, when I talk about DataOps, I tend to anchor back towards the technical practices from DevOps. So that’s kind of where I saw it come and how I describe it. The ideas about you build it, you deploy it, you break it, you fix it the ideas around automating anything that can be automated, that has value to be automated.
And the idea of, treat everything as cattle. Ideally we can get rid of it and rebuild it without a human being involved if possible. Whereas when I look at your definition of data ops and all the writing you’ve done. It seems much more akin to what I call the Agile data way of working, because you’re bringing in team design, you’re bringing in process patterns, you’re bringing in team working, you’re bringing in lean patterns.
So you’re bringing in a bunch of patterns from many different domains and then tailoring them for data teams, data and analytics teams to do it. Would that be fair? Would that be a fair way of how you
Chris: think that’s fair because, because I got really burned talking, you know, I’m a software guy. So talking about DevOps for data and analytics. And all those words that you just said, I’ve experienced in building software and managing software teams, right. And the success in that endeavor. And so whether you do automation or deployment or CI, CD, I think those patterns really apply to data analytics, but they’re sort of necessary and not sufficient and they’re not sufficient in two ways.
One is that I think there’s a set of ideas that come from lean manufacturing. Because the sort of left to right process of integrating data and producing insight is very much an assembly line. And that process has things like statistical process control, data quality testing that are unique, those lean ideas need to be adapted into data and analytics. And then the really, the other pattern is helping teams get less focused on technology and more focused on their customer. When I talk about it, I try to talk about the end results. it’s about productivity. It’s about making your customers more successful instead of, let’s do more automation, which is a means to get there. And so I think they’re both true. Cause I think you have to look at it in different ways. Like you, I think the patterns here are not new, , I look at it in a very abstract way. Like we have. A set of people who are working on a technically complicated thing. And how you manage that team, or how you run that team, it’s very different when we’re all working on a shared thing than perhaps politics, or perhaps running a school.
Those are very different management paradigms. Whether it’s a factory that’s the shared technically complicated thing, or a software, or a data analytics. Pipeline or set of tools or warehouse. Those management principles all apply. having us being safe, like saying this doesn’t make any sense. I’m trying to reduce complexity and be able to reuse things, trying to manage from metrics and measurements. All those principles really apply. And you can kind of see the same reflection of those principles in. An assembly line or a data and analytics. And so I don’t think any of this stuff is particularly new.
I think, Deming applies. I think the DevOps movement applies. I just think we’re, if anything, we’re just trying to take these principles and say, look, , they have a unique instantiation and data and analytics, but the ideas are old. The sort of build it and they’ll come. Or organizations where I do one piece and I don’t really focus on value delivery to my customer. I think those things are anti patterns that we should get rid of, but I do think they are pretty prevalent in organizations today. I still think we have a need for, whether you call it data ops or agile principles or whatever, whatever the term you use, I think data and analytics teams need them out now more than ever.
And,
Shane: term you use for coalescing those patterns. I agree with you around data factories, the more and more I looked at it,
I bounced for many years between teams that craft bespoke
artesian products and a factory that just, goes through a series of steps to pump data.
And I struggled with that because, while I wanted automation, that left to right behavior for data, what I saw is a whole lot of complexity, which meant that that process never worked. And so I ended up In my head, decoupling the process of manufacturing that data versus the teams and the behaviors they use to manufacture it.
And often I can go back. So, you know, when you’re talking about teams and there’s a flavor of self organizing, self managing in there. If I think about a lean manufacturing line. There are buttons where when a person’s working a machine and they see there’s a problem with the line, they push the button and the line stops.
That’s their self empowered right to say something’s broken, we need to stop it all coalesce and fix it now before we carry on manufacturing. And often we don’t see that in the data team. Often we don’t see the ability for one team member to say, Hey, the actual process we’re using is broken.
Let’s stop and fix it. I mean, we talk about scrum sometimes, and we talk about agile and we talk about retrospectives, but that self empowerment of each team member to actually stop the line fix the process together and then start the line. Often non existent. Is that what you see?
Chris: Yeah, I think there is a market ostrich phenomena where people want to hide from the problems that they have, because they’re afraid of looking bad. And so, whether it’s I’ve got poor data or I put some code into production or I’ve got some code in development that’s not working. All these cases are patterns where it’s important to surface problems and have a safe way to say, Hey, here’s a problem.
Let’s fix it. And raise it instead of trying to not notice things. And so a lot of what happens in data and analytic teams is quality itself is pushed off on the customer. They focus on the technical issues. And value is focused off on the customer, right? It’s like, Oh, I gave you some stuff. You figure it out.
If it’s useful, you figure out if it’s quality and they’re sort of largely mechanical in that way. And I think that’s a really bad idea because then you end up building a lot of things that aren’t used and that’s just a waste of your time. And I don’t know, maybe people are happy with that, but I’ve never been, I’ve always wanted to be part of Making a difference in the world. And I think data and analytic teams can with data, but they can’t when they just sort of vomit up lots of data of poor quality and sort of go home at four and don’t really care if it’s used. And so I think that it’s a better way to work to focus on your customers and trying to find that they trust the data, give them what they need, and then try to focus your team on how to maximize making your customer successful. It’s just a much happier place to work. Because you really know that you’re having an impression on the world. And I think the challenge is a lot of teams are beaten down because they don’t feel successful. So they feel like they’re blamed for everything. They have this sort of shit sandwich view of themselves, where like they get crappy data, people complain, and they’re the middle. And that causes them to sort of turtle up and not want to deliver value. And that’s not unreasonable, but it’s still not an effective way to work. And a lot of teams who don’t do that they are really effective. I think that sort of turtling behavior in and of itself is not, is not great.
It’s psychologically makes sense. I don’t know, do you see that or teams are kind of. I’m getting beat up by both sides. I’m just gonna, focus on, my task list and that’s it. Yeah.
Shane: years ago. And there’s a bunch of challenges there that we’ll go through in a minute. And I don’t think it’s changed. In fact, I think some of the challenges you called out then like return on investment have become a bigger problem especially with the financial downturn, ?
The lack of value delivered by data teams or the perceived lack of value has caused for lots of data teams to be made redundant. I go back to that idea of a factory. And if I think about car manufacturer, they focus on optimizing the factory in the line because somebody has already done the work up front to see where the value is, ?
Here’s a design for a car that , we know is valuable. Build the car, push the car out. You don’t get to see how people drive it. You don’t get to see how it improves their lives. So your job is to focus on that factory. Whereas in data we effectively want to push left and, and pick up some of that product thinking.
We want to work with the customers to understand what value they need from that data. So we become the product designers. And then ideally we want to watch and observe how they use it at the end to deliver that value, . To see how we can iterate it. Because, the benefit we have is we’re not building a one and done object like a car that we can’t iterate in the next whatever, right, next sprint iteration.
We can go back and actually change it like software developers and product teams do. So it’s this balance between automation and factorization. and the ability to understand the customer and iterate with them and experiment to solve their business problems. So again, I often go back to analogies around food.
And so data kitchen, , I’ve always said I’m a bit grumpy that you coined data kitchen well before I got there. Cause it was like,
Chris: Well,
yeah, I mean, the terms are there, right? A lot of, there’s a lot of data teams, if I could kind of riff on, on your idea of if you think about the value stream that goes in data, right, one of them is best expressed as a factory, as you said, , the left to right movement of data in production and thinking of it as a factory that makes cars and manufacturer of insight, all those lessons and stopping the line.
Those are really good things. But if you look at the, the lean and Deming like principles that were applied to software, it’s more about the development one, a perpendicular to that factory. I want to take a piece of it and change it really quickly. And I want to do that with low risk and fast. And that’s a different value stream. And if you look at the DevOps literature, when they focus on value stream management, or the Dora metrics and software, they’re really focused on, That perpendicular. Can I change my change rate, my change break rate? Those are really important things, but data teams are both manufacturing teams and software teams. That’s what makes it interesting. And we’ve always liked the T diagram rather opposed to the infinity diagram that emphasizes those two linked value streams, because you have to do both. You don’t know what your customers want. You have to give them something that they can see and touch quickly and iterate upon it. Start with something in paper and keep working towards it, right? Information product canvas, your innovation, I think that’s really great. But on the other hand, once you get something that they use, you’ve got to make sure that it runs and you want to have A low cost, low effort way of running it. You want to have the system tell you if there’s a problem that there’s, you know, I got some bad data or the server is down and all those things are true.
And that’s what makes it interesting. That’s why I find it attractive. It’s that combination of the two value streams, the manufacturing value stream and the construction value stream and DevOps, the software part and the car manufacturing part. Cause it’s both right. You have to do both. So the focus on metrics and how you run teams, that’s what makes data and analytics, I think, a challenging field.
And also the weight of data is different, right? When I had spent 15 years building a lot of code and running software teams, what I used to think as a software engineer, the data, it’s just, eh. But data has weight, it has mass, it has meaning, you have to discover requirements in it, right?
You have to interact with the data to learn. Whereas in software, it’s like data is an output, but in data and analytics, it’s both an output and an input, there’s this just interesting parallels in my world of, having lived in both domains for so long. That I think both apply, manufacturing and software, they both apply to data and analytics, one of the reasons I think I like data kitchen as a name, because if you look at a good restaurant, they’ve got to do two good things. They’ve got meals that are on the menu every night that they want to be consistent and high quality. And they want to control costs and they don’t want the master chef to make the same meal 15 times in a row, they want to have someone less skilled take it over and deliver the same quality. However, a good restaurant also innovates and has new items on the menu every, every week or every day, and so you need time to do both.
You need to run your restaurant with a very consistent way of producing repeatable dishes. You want to voice that work onto people who aren’t , your key stars. And then you want your key stars to be able to go off and create meals, get feedback from the customer, keep iterating. And it’s the same thing with data,
it’s the same thing. You wanna be able to hire a good 22-year-old , and have them take over stuff,
Shane: And I love that analogy because I can take it a couple of steps further. So the example I use is if you’re in a restaurant and the tomatoes get delivered and they’re constantly rotten, as a chef, you’re not going to go, Oh yeah, those tomatoes are a bit rotten. Let me just cut off the rotten bits and I’ll carry on.
You’re going to go, go back to that supplier and tell him to give you good quality tomatoes. Tomatoes or you’re going to go find another supplier, now in the data world, we probably can’t find another supplier, we can’t go and grab that data from anybody else, but we can push back and say, stop giving us rotten data.
Whereas what data teams tend to do is we tend to try and fix it on our side in the kitchen, and that is inefficient. The other thing is the feedback loop in a restaurant is visible, it is a Kanban board. Because if there’s nobody in your restaurant, you have a problem. You are not adding value,
it may be wrong food, bad food, wrong price. There’s a whole lot of reasons, but you get that visual feedback that actually the product you’re producing is not valuable. Well, in the data teams, we often don’t get that, we’re obviously behind another team or another person. I still see the. Manufacturing, software, and then the product patent starting to come in now to fix that problem around discovery, understanding what people actually want, and then proving delivery of value. So I’m 100 percent with you, ? The, the kitchen analogy , just works. And like you said, it gives us that balance between teams that just need to do the same repeatable thing day in, day out, and , the craft that comes with our domain,
to solve unique problems for a business each time.
Chris: Yeah, and I’ve come at it both from a technologist, but I think I’ve really come at these principles from a manager’s perspective. When Agile started, I’d spent about 10 years as a developer. I was first starting to manage bigger teams and I look at it really from a leadership perspective and it really hit home when I was managing data and analytic teams, because the key insight I took was after reading Deming saying, look, when you have problems, it’s mainly the process people work in and not the person. And like Deming says, 95 percent of the time. And so whatever percentage is, it’s a high percentage of the time. And as a leader, you own the processes that your teams work in. You get some bad tomatoes from your supplier, You pass them through and the dishes are shitty at the end.
It’s your fault. Right? And it doesn’t matter. You can’t, it doesn’t, Oh, I got, my bad supplier gave me. It’s like, no, it’s your fault. As a leader, you own the result. Not the person who cut it up, not the supplier. It’s like you own it. And so you have to fix the process.
Chris: The right thing to do is to talk to the supplier. Maybe the right thing is to change your menu or cut the bad parts off. Like it’s very context specific, , I did some mistakes when I was a young leader. I fired some people because I thought it was their fault. And that I regret, honestly, because it wasn’t their fault.
I didn’t fix my team. I didn’t fix their work environment. That’s like a really key to me, leadership idea that you own the processes, your teams and work in. And often when errors are made or results aren’t delivered, it’s because you haven’t set them up to be successful. That’s a really hard thing for leaders to hear. And it’s a really, it’s like, shit, it’s my fault. And I got to work on this. And so it’s much easier to blame the bad tomatoes than to realize I haven’t built a system that can handle bad tomatoes and I have to figure out how to do that.
Shane: As data people, we’ve got to realize that our default behavior is to blame the tomatoes and blame the customer for ordering the wrong meal. We’re not telling us what meal they wanted, it’s like, ah, well, we gave you what you asked for. Yeah, but you didn’t solve my problem. And oh, it’s not my fault.
I got shitty data from the CRM. That’s A default behavior for a lot of teams.
Chris: Yeah, and I guess I’m, I’m really sick of, this is going to sound terrible, the whiny little bitches who blame other people. I’m really tired of it, especially the people who lead teams. I can, you know, a person who’s working in a team who complains because the boss is, you know, shoveling stuff to them, I have sensitivity to, but someone who’s a director or a VP. Who’s blaming the customer and blaming the supplier. I really, they are just whiny bitches . Cause it’s their fault and they’re not trying to fix it. I hope there’s a special rung in hell for middle managers like that.
Shane: But then I often, because I coach teams, I get frustrated at teams who say that’s just the way it is, they can’t change it. And I call bullshit because when you’re in an an effective organization. Actually, you have the ability to make more change because the belts and braces that an efficient organization has don’t exist.
So you can actually go and do pretty much anything you want. And if you deliver value, you’ll probably get away with it. Right. And I do realize that people, you know, often need the job and the money and there is a risk profile, but they can still change small things without having to ask permission.
Chris: It’s creative act of managing. Often finding the real problem is the hardest part, and then finding the first problem to get is the hardest part. That’s actually really interesting management behavior. And if I think over my career, the things that I’m, proud of the most are when things have been the broken the most, and I’ve had an effect on them. That’s what I’ll look back on. Not when things were going well and everyone was firing. It’s like, wow, this is really broke and not, we got together and fixed it. , and that communal aspect makes you feel better as a leader. And I think you got to start with admitting that there’s problems and You know, blaming suppliers, blaming your team, blaming your customer. I’ve done that, right? I’ve done it. I’m embarrassed to say I’ve done it. I’ve been that whiny little bitch and I just don’t like it. And I guess I think leaders should step up and realize that they have a lot of ability to fix things, like you said, and that when they do it, it’s a, an improvement. There is some indication it’s going much slower than I thought. I actually see the individual contributors, the individual data team members, data engineers, they’re actually stepping up and, you know, using version control, putting testing in, automating, refactoring. Trying to reflect on their work and improve.
The effect that’s happened has been much more bottoms up than top down, at least in my view. And I think the managers are laggards. Not all are, but managers are laggards in this.
Shane: I see some what I would call high performing teams. And if I look at part of their success is normally empowering leaders, people that set the vision, get out of the way and then support the team and remove any impediments or roadblocks that they can see. If we look again at manufacturing, we’d see this term walk the floor.
You know, you talk to people in those organizations that are leaders, and they will spend an inordinate amount of time actually on the manufacturing line, walking the floor, observing the process to try and determine where something may need to change. And within the data world, I don’t see that, I don’t see the leaders sitting next to the team and watching the process.
And they should, because it’s that ability to be disconnected. , so the ability to be able to sit back and observe the system that people are operating in, that actually is really powerful. And being one of the data leaders, you’re one of the few that has the time and the ability and the permission to step back outside the work being done, to watch the work being done.
So again, one of the patterns, strongly recommend if you’re a leader, just go and sit with the team. And watch how they work and, draw it, get a piece of paper, . And do what I call nodes and links. . Draw a circle for when work starts determine who’s doing the work, do another circle when they hand off, determine who they hand off to, and then look at how they hand off.
So when one team member passes work in the factory to the next team member, what’s the thing that they hand off for the next person to do their job and how efficient is it? How well described, how much rework does the next person have to do? They’re all valuable principles that come out of that lean manufacturing that we can apply to data.
Chris: You know, there’s kind of two views again of, of walking the floor. Like I think the sort of production process, we’ve got some new data, where does it go and who uses it. And where does it end up and why are people complaining? There’s often no knowledge of that, ? It’s all this sort of black box.
Maybe they know where it came in, but then they don’t know what teams do with it. They don’t know where a port is, who’s extracting it. It’s very confusing, right? The journey that data takes in production. And then, as you said, the process to actually put things into production, the development environments, the handoffs that happen, the quality checks. , and because teams are pretty balkanized in that, it’s often fragmented. This team takes raw data and ingests it and fixes it up. And they have their own development process and their own production process.
Okay. Now another team of analytic engineers takes it the next step. And then the reporting team takes it. And they have three different processes to put things into production from development to production. And then they have three different. Steps to actually operationalize it and run the data in data factory.
It’s crazy. And no one knows. And so when something goes wrong, you end up with this sort of. Intersection of Conway’s law and pass the buck, ? It’s, and so it’s like walk the floor, , just draw the floor, , just know where you are. , so you can talk, it’s amazing how many people get in positions, especially with the sort of techno fetishism that goes on in data with the new buzzwords and the new tools you have. Vestige eras of tools scattered around. You have , late 1990s, early 20th, you have modern data stack, you have cloud, you have this. Oftentimes you just don’t know where it is because it’s which version of the data stack and, and, you know, and the team who built it got frustrated and left, and now they’re onto the newest stack. And so it’s because some of us, I think, are chasing. Resume building as opposed to value delivery.
Shane: I agree. Some some teams are formed so that the leader gets the next job, but we won’t go there. Actually, I just again, I’m going back to kitchens and cooking and that three step process. You talk about, , the ingestion, the engineering and then the reporting. It’s kind of like having three restaurants, so rather than having three stations in a kitchen where, , somebody’s doing the initial prep and then it goes over to the next person and the next person that then serves it, or you know, plates it, and they’re all talking to each other if you ever watch a kitchen, ?
There’s instant feedback, there’s quality control, you know, if you pass your part to the next station and it’s not up to spec, you’re going to get some feedback. Whereas in data, what we tend to do is create three kitchens, three restaurants and there’s an Uber driver that picks up, you know, a half cooked meal and passes by.
In fact, I was watching a TV show the other night and they had a thing called relay cooking. So it was one of those, , experienced chefs having a competition against each other. And so each chef had 40 minutes. And there’s three chefs in each team, and so the first chef gets given a set of special ingredients and has to plan what goes out in three hours.
They get 40 minutes but when the next chef comes on they get 45 seconds. To explain what the rules are, what they’ve done and what their vision was. And then they leave the kitchen. And so then it happens two more times. That’s the data teams, ? But it gets worse. I was just thinking, , you talked about ingestion, engineering, and reporting.
We keep forgetting about the fourth team the business user of Excel, because what happens is we deliver this beautifully crafted meal, , five star meal to their, table. And then what they do is they decompose it down into a bunch of his raw ingredients and they make a completely different meal because that’s what the majority of people do about data with the old export to Excel.
Chris: Yeah, that’s so right. It’s so right that there’s like, there’s three restaurants with Uber drivers and then a wannabe chef at the end who’s reconstituting it. It’s such a communication challenge and such a feedback challenge. And that’s why these processes that people work in are so important to get right and to think on and improve.
Because that’s the fundamental challenge. Everyone sort of lives in these worlds where You’ve got four teams and the vice president of marketing or sales who wants the insight is scratching their head why it’s taken 30 people and three months to answer his business question,
and why are you doing all this? And, they’re frustrated, ? , and what’s going to happen. And you’ve hinted at it is eventually you’re just going to start firing, right? And the good times they hire consultants. And you get to keep your job. In the bad time, and you can already see how hard it is in some fields to get jobs, and it’s the threat of AI.
Also that’s, whether it’s a real threat or not that you can produce work faster and we could have a whole conversation on that. But it does allow more people to get into the field, but it also gives a reason for organizations to reduce headcount. When you have this, Four part, four kitchen disharmony of producing food for , your customer. And by the time it gets to them, it’s cold and stale and looks weird. And they’re like, this isn’t what I ordered at all. Cause , that’s the case. And you don’t know what your customer wants. Right. And you gotta put a plate of food in front of them before they know if they like it.
Shane: That’s the thing that, you know, whether AI delivers or not. , whether it is a hype cycle or it is something that is valuable and changes the game. The threat to the data teams is it can deliver the same bad meal that we often deliver much faster. And with a lot less salary
than what we deliver, ?
So that’s the problem we have to go against. So let’s just look at some of those challenges that you documented in 2019, because I’m intrigued about what’s changed, ? So one of the challenges was changing requirements and, the DataOps approach that you described is reduced cycle time.
Don’t leave a big period between asking what the requirements are and delivering something. Deliver something smaller and faster and get some feedback, ? Has that changed? I mean, I don’t think organizations are ever going to have fixed requirements that never change because organizations are dynamic.
What are you seeing? Are you seeing changing requirements no longer being a problem?
Chris: It is, as it always has been, people want answers, right? And they don’t have time and they’re going to either use data or not , or use their intuition, right? And if you look at how leaders process information, , they’re using social cues, they’re using trends, they’re using force of personality and, , they use data too. And so I think we’re always in this struggle that. Leaders will make decisions and the question is, are they going to do it informed by data, which increases your likelihood of success, or informed by something else? And I think, we still have very powerful leaders who make decisions not based on facts.
We need to work towards that because , in the end, if you don’t make facts, they have this nasty way of coming back at you, right? And so you want your leaders to be. Emboldened with the best information possible so they can make the best decision possible. So, and to do that, you’ve got to get them the right information in a way that makes sense to them.
And that’s an iterative process. I started doing analytics kind of full time in 2005. So it’s 20 years now. And I had another sort of 15 years of software delivery where I was working with mainly air traffic controllers, trying to develop AI software to help them land planes better. That was a very iterative process. And so I’ve been lucky in my career that I’ve always worked in an iterative way because the, build a large document with a lot of descriptions and then spend months building it. It fails in software so many times and it fails in data so many times.
The only way to do it is to put something in front of them, get it 70 percent right, learn from it. It’s really about maximizing what you don’t have to do is the real key here, because you think they want 10 things. Inevitably, they want four, but there’s some other six things that you thought they needed that they didn’t. And it becomes down to waste. It’s a waste minimization, because the more you think they need, the more , that you give them that they don’t need, you just wasted your team’s time. And so forcing feedback, it helps your team learn. It helps you. Cut waste. It helps you actually meet requirements. And all those things I think are, it’s just the way of the world. it’s not that your customers are ignorant or they’re mean or you know, you guys, we don’t socially get along. They went out and drank while you pounded the calculus books in college. And you’re jealous of that.
Chris: It’s none of that stuff. It’s just the way the world is, right? We just have different domains and let’s work together. I don’t think AI, the case, right? And if anything more automation, more tools can help us iterate faster. If we work in a system that does it. Just the speed of creating new code. Is not the problem, right? Cause most of the problems are in waste. Not , in creation. If you can create code 10 times faster, it actually doesn’t help the waste problem much, right? Cause if 70 percent of your things aren’t really being used or not needed at the end, that 30 percent that you’ve done faster, so what?
Shane: I think people get, the idea of this small cycle times. Sometimes they get it wrong. I was talking to somebody the other day and I haven’t heard this one for a while, , I think it has now become an outlier, at least I hope it is, but they were talking about a requirements gathering sprint.
You know, two weeks to gather requirements. And my point was this waste, ? Cause you’re going to give us so much in two weeks, if you do it well, if you do it badly, you’re going to give nothing, but let’s assume you do it well, you’re going to give us so much in two weeks that you’re not going to deliver any of it.
A lot of it’s not going to get delivered. So you’re spending this waste. And I think the other thing people forget is the cognition level of our customers, of our users, of our stakeholders, because if we spend too long building More stuff than when they first see it, it’s too complex. There’s too many things it does and they have to then determine which of those things solve the real problem.
They had the real requirements versus the other things that got added on. I’m with you. I go back to reduce cycle time. It’s a forcing function to reduce waste. It’s forcing us to put something back in front of the stakeholder early, and then we get feedback, ? So the key thing about it is the feedback loop is where the value is.
No. Push it out and then move on to the next thing.
Chris: There’s also the myth of overgeneralizing. If I just have a representation that can handle all their needs, It’s a report with lots of pull down tabs or the right schema format. Then I can answer 10 questions that they have in the future. It’s that trap of deferred value , if I do something more complete now, I’m saving myself time later. It is true in some cases, but it’s more often not that you’re overgeneralizing and you’re doing too much and you’re wasting time.
Shane: I struggle with that one, ? The data collection process is one that I struggle with. So there’s two kind of patterns I’ve seen. The pattern of, if I’m grabbing the data from the source system and so let’s say I’ve got a layered data architecture, so I have some form of what we call history, but you know, data lake, landing zone staging,
bronze, I think, and medallion, ?
I’m just going to land it and it’s going to persist there for a long time. My natural reaction when I’m touching a source system is to bring as many attributes that are easy to bring over that make sense, ? And the reason is, I know the amount of effort it takes me to go back and collect more data.
Even though it should be automated in a solved problem, it is not. Typically one of the more effort intensive tasks. So I’ll go, , I need these fields to solve this business problem, but these other fields you know, attributes of customer, I’m going to bring them across. And so when you look at stuff that I’ve designed and built you will see, , we run a hybrid form of data vault, so we have details about concepts
so you will see my customer detail. Table being , quite broad, ? Whereas my co founder Nigel was an engineer and he only brings across the fields he needs, ? Cause he sees every other field has been waste, ? Now there is a cost of maintenance and a cost of maintaining those extra fields and we’re not using them.
So cost of storage, it is low, but there is a cost, but it’s more of the cost of change, , when they change in the source system, we now care. Cause we have it. If I’ve brought it across, if we haven’t, we Used it, but I still struggle with that one because I look at the cost of acquisition and the time to market and being able to reuse those fields quickly when I need them versus Nigel’s lens, which is around optimization and reduction of waste.
So therefore anything we hold that isn’t being used is waste. So again, it’s just balances and you just got to decide which pattern as a team you’re going to subscribe to, because ideally you only want to use one consistently. You don’t want to bounce depending on who it is.
Chris: I think it’s all about balance, ? , and between taking 10 attributes for the whole table it’s balance and context that matters and having the discussion. That matters, because both can work and both are right. And so a lot of teams get caught in this sort of fear and heroism,
I don’t want to do anything because I’m afraid to break things. Or I’m going to do everything because I want praise from my customers. And the reality is you need a little bit of heroism. Sometimes you need a little bit of fear sometimes, and most of the time you need a proper balance between the two. And so it’s the balance between only pulling what’s needed and pulling everything. Well, the point is you’ve got to talk about it. . And it’s not about one or the other. It’s just about saying, well, I’m going to pull the whole table because it’s, I can get a select star.
There’s a whole bunch of cases in there. And so I just think it’s not a holy war. It’s about balance and between agility and fragility, finding the way to make things work. And I think that we should as teams continually do that. Cause it’s so often the case as engineers, because , I want to build the thing that I think they need.
I want to build the more general thing. I think that they’re going to need this in the future. And then will they need it? Is it waste? That’s such a key tension. And in a lot of cases, I think we need more people like your partner or just saying, let’s do less, let’s focus on minimizing waste. And if we had more people like that, because I think the data world is full of people who are like, let’s pull it all up. And let’s have the big Hadoop cluster or big Spark cluster with everything. And then I’m going to vomit up, you know, lots of data to lots of people and magic’s going to happen. It’s the field of dreams, build it, they’ll come. And that’s 90%. And I think we need more people pushing back saying, let’s just do a little. And then the reality is at the end, you’re going to have constant tensions between those. And if you. Have a good co founder and a good coworker. You’re going to be able to laugh about it over beers and say, I got that right. I didn’t get it right here and we’ll work towards it. And that’s fun, right? I mean, it’s, it’s the human process.
I would think we just have to think, stop believing in our own prowess and ability to overgeneralize the world , and really focus more on waste reduction in general, as an industry, because a lot of times, What comes with the overgeneralization is overpurchasing, overbuying, overengineering. And there’s a lot of waste in software spend. There’s a lot of waste in systems I’ve met CDOs that are in the process of building the data architecture for their 30 billion company. And , they’re trying to gather requirements off of it.
And I know when that happens. Run. It never works. It never has worked. It never will work, right? But there are still people out there trying to define the data architecture for their entire organization. And so that’s obviously over general, Can you start off with one team in one case and iterate your way to get some value?
I think that’s less waste. And then should you bring one table or a whole table or a part? I think tensions are good in organizations and discussion of it and, and compromise. And working it through and yes, it would be better if you had a standard, but , okay, maybe one person brings over three attributes.
The other person brings over 30. You run for a while, you realize you should focus more on three attributes because you end up getting more done and everyone, he learns. I love the tensions. I love the discussion because that’s the essence of what reflecting on your works about , and as a team owning your own process. And as a leader, I don’t like to make those decisions. I actually like the team. You know, you got a person focused on waste reduction, their intention, fantastic. Let’s work and try it one way, try it the other, and we’ll, we’ll work our way through and that makes a good team. Cause then they own it, , they own that decision and not the manager,
Shane: I agree. To me it’s about patterns that have value, given the context and if we can describe what they are, then most people will use them because it makes your job easy, but they’re not rules, they’re not immutable. If our core pattern for the team is bring as few attributes over as possible, like you said, if somebody brings over three, somebody brings over five, it’s within that.
And if there’s a reason why, , we have to bring it all over at the first time, because the context says that’s the best way of doing it, then that’s okay, it’s just a different pattern applied to a different context. The other one that you raised is quite interesting. We’ve lost the art of data modeling to a degree.
And there are a bunch of people I know who are expert data modelers. So they can sit in a session with a bunch of stakeholders, they can somehow conceptually model that organization in their head, they can go away and they can very quickly draw a conceptual data model for the organization that is highly accurate.
But they are experts. They’re like master chefs who can just grab a bunch of ingredients and make the most amazing meal. And I can’t do that. So we’ve got to realize the rest of the team that you get normally can’t do that. So we need processes or patterns that help them do it in smaller ways, learn more, get more feedback.
And then the last one is going back to the why. OK, so I’m grabbing all those fields and bringing it across, but that’s waste. Why is it waste? Well, if the source system changes something, we now have work to do on a field we’re not using. OK, so one pattern is don’t bring it across, but another one is, well, could we automate it,
is there a way that we could actually adapt to that change in the source system for that field without a human being involved?
Can I remove that waste of human effort? And if we can do that, then the cost of me bringing team fields over is less because we’ve solved most of the problems. Now, we won’t solve every use case,
there will always be exceptions where something happens where the cost of automating it is way higher than the value of automating it. But again, that’s the team, it’s up to the team to decide where the value is, where, The most valuable time for them is to build their factory, their processes and the data for the customer.
Chris: That’s what makes it fun to be, because I think a lot of teams are disempowered, unhappy, and if you get as a team to make those decisions and own them it’s fun to learn and watch your team evolve. It sounds like your co founder and mine are the same.
I tend to be bigger picture. I overgeneralize. My co founder is like, let’s just do these three things. Let’s get something done. And we have the same conversation almost every other week. We had one last week and , it’s the same thing. And , he’s going to keep saying the same thing. I’m going to keep saying the same thing.
And sometimes one of us is right. I’m becoming a lot more like him. He’s becoming a little more like me and you get better as a team and making those decisions because there are cases where maybe you can conceptualize the business in six dimensions and you’re right.
And if you build out that perfect schema, you’re going to be good. , and other cases where you’re going to get it so wrong that you’re going to cause a lot of waste. And , it’s always that tension between over generality and reducing waste that I think. We have to grapple with, but you have to have it .
I like a lot more teams to start with less , and because we believe in generalization too much as an organization, and we believe in our power of abstraction to relieve us of future work, and I really think that belief in the relief of future work almost always is a failure. And so make sure you do what you do today well, and just do that. And then you’re going to be able to move on and work from there. And I think it’s a hard discipline for people because of egotism, because of the way the market talks about tools. A lot of vendors will say, you put my tool in, you get all this stuff for free. Anytime somebody selling you something for free, Run, right? So nothing’s free. And the same thing with your architectures or your decisions on pulling data over. I like the tensions in organizations. I like empowering teams. I like having customers do it, but it is hard when teams have an existing system in the real world, this all sounds great, but in the real world, a lot of people have legacy systems that they’re still in charge of that they don’t really know. . Somebody else built it and left. So they’re stuck with it. They have bosses who are in the blame game and are love to blame them. They work in blame organizations. How would you advise people? I have a tough time with that, when, you’re in a blame organization and you have a bunch of legacy stuff , how do you advise people to make progress with DataOps when they’re set up to fail right from the start?
Shane: which is horrible, right. But that’s the pattern that works in those organizations. When people tell me, oh, they’re not allowed to spend time fixing the processes that they’re creating.
I go. How accurate is the estimation process? And they go, we do an estimate and then the manager doubles it and then the next person triples it. Okay. Okay. So there’s no accountability now for what you say, . Versus what you do. So take 20 percent of your time and focus on engineering something that is causing you pain as a team.
And just don’t tell anybody because nobody will notice. Now that’s a horrible way of working, ? That has no visibility, no feedback loops. Okay. But if that’s how the organization behaves, that’s the context you have to work in, then that pattern will work, ? You just do that work. It’s embedded in something else that just happens to take longer, but nobody knows how long it’s actually going to take.
You’re going to get blamed for taking too long anyway. You know that because that’s the corporate culture. So make sure that you make your life better because then the work you do will be better over time. And that was one of the lessons I learned. when I would drop into coaching a team, I would typically pick up all the patents from a previous team and say, these works, you should put them in, you should implement these, ?
This is what you should do because it worked last time. And then I quickly learned that the context of the organization, the team meant that often that wasn’t true. And even like you say, with co founders, when Nigel and I worked on projects together, The way we work together was markedly different than the way we work together now as part of our company, because the context is different, ?
And so that’s the key, ? You’re crafting it. And then the other thing I really encourage people to do is look for patterns that work for you. Well described and apply them because all that thinking somebody else has done all that testing of those patterns gives you a quick start, ?
, you reduce the amount of effort you have to do to fix a problem. And so that’s one of the things that , the data kitchen cookbook did, ? Was it gave examples of patterns that solve problems because they’re common problems that have been solved before. And so when I look at your seven steps to implement DataOps, you’re sucking out what I would determine as DevOps principles or DevOps patterns and applying the data teams.
And again, we’ve got to remember this was written in 2019 when we weren’t doing a lot of these things, and I still wonder if we’re not, one of the first ones was, Add logic tests. , that was one of the core patterns you called out, ? Implement version control system.
Now, the interesting thing about that was version control systems have been around for ages, but it wasn’t until we got a tool called dbt that the market seemed to have adopted version control. And I’m really intrigued by that because the vision control systems have been around for a while. Data transformation is just code.
But nobody was doing it. But then we get dbt and everybody starts doing it. But if I look at the third kitchen, if I look at the BI reporting restaurant, we still don’t version control, ? Very rarely do I see our dashboards and reports being version controlled. If we go back to data ingestion I still see team snapshotting , and doing no version control of the data.
So again, we seem to pick these patterns up and apply them in part of our factory and then not in others. So is that how you see it still, You still say go and find good patterns, describe them and use them
Chris: if they have value?
Let me answer that in two parts. Circling back to the question of how do you handle a team that’s lives in blame and got a lot of legacy code. And I think you were saying, let’s get 20 percent of the time where they’re refactoring and improve.
Chris: , I agree with that. I like a little bit of a different variation. I like the quality circle. Cause I think the most visible things are errors. You’re late, date is wrong, customer complaints. I’m a big believer in put every error that you see visible or non visible to the customer in a spreadsheet, and then reflect on it once every two weeks and find something that you can do to fix it in an automated way. I personally have been a believer in that process because A, you get actionable stuff to do, you’re coding things to do and fixing things. And then second is it changes the attitude towards errors in your organization, because instead of becoming a personal thing where you blame someone, it becomes an item on a list that I can fix and improve.
And so I’m a big believer in quality circles , as the first thing. Dbt in my mind did more for data ops than Data Kitchen ever did because of coding, because of dbt tests. And I think that because of analytic engineers and yeah, you’re right. It’s not applied fully. Certainly not in the report domain, but data scientists and their models sometimes. Yes. Sometimes no, the ingestion teams now. Governance teams, not at all. They don’t even think about governance as code. That’s like alien. You know, we still have these words of active metadata , I’m going to active metadata and magic’s going to hear.
The most important metadata of any organization is code. Well, that’s the definition of active metadata, right? It’s code. And so keeping that centralized and versioned is incredibly important. There has been some progress because when I first started the company, 2014, we did our first conference, 2015.
At the enterprise EDW conference? we were doing the seven steps and I asked how many people have heard of version control and literally 5 percent would raise their hands. And we like, Whoa. And now if I ask that same question, it’s 95%. And so that’s good, right?
People understand version control and they’re starting to understand testing due to, , a whole bunch of people talking about data observability, which has moved that ball forward. So I think those things incrementally are making the world better, but the future’s there and some teams are not.
And even then with. DBT, they’ll use version control, but they won’t be managing environments correctly. They won’t have any kind of data quality tests, , to tell them that the system’s wrong or to tell them if the system’s wrong in development or production. They don’t have accurate test data.
There’s a whole bunch of inner mingled things that you have to do to have really, really high velocity and high quality. That I think teams are working on. And often, the challenge is just their overwork and their commitment to the customer. There are a lot of heroes out there. And hero teams are the hardest.
We do some consulting as well. And we work with a hero organization. They’re delivering data analytics about media to people who are making media buy decisions. And they’re really working late. A couple of heroes, they’re trying. And they built a lot of technical debt. They didn’t have any tasks. They did use some version control, but they had two systems. And, I think you can make an impact on an organization slowly. They’re improving. It didn’t happen overnight. They’re improving and they’re going to keep getting better. And so I think as all of us get a little older and understand organizational change is not a flip a switch, you have to work gradually. Yeah. Whether hiding 20 percent of your work on refactoring or doing quality, you should be able to show when you make these changes, you want to show the effect of them. And so that 20%, you want to point to, we refactored a whole bunch of code. We remove 15, 000 lines of code. We’ve taken things for things. And now the business effect of that is we’re deploying two days faster than we were before. If you can measure it, cause I think the other part too, of the challenge of these teams who have a hard time is they don’t measure anything about their work. There are no DORA metrics for data teams. What do you measure? Do you measure cycle time? Do you measure change fail rate? Do you measure error rate? When you have three different kitchens, how do you measure each? As data and analytic teams, we’re so unanalytic about how we run our organizations. And so I’m a big believer in pulling ad hoc measurements forward. Just knowing it takes you three weeks to deploy. You don’t have to measure it, but if you can get rough or knowing how many errors you have or , how many code commits that you’ve done, broad brush changes, even, scores from your customer, just net promoter scores, do you like what we’re doing? Yes or no. Are we helping you? Yes or no. Send that to all your customers. Find out, you may not like the answer, but find out and, do that once a quarter. I think a lot of data teams would really be shocked at their net promoter score. Because I don’t think they are being effective.
I think that would really rock a lot of their worlds to find out that 80 percent of their customers think they’re useless.
Shane: If we look at the product world, , NetParadise gives us one insight around whether people think the product we built is valuable. Usage stats. Product teams will record who uses what and how often the number of times I see data teams not instrument who’s using the data and how often and which is an engineering task,
This is one that always confuses me is that to engineer the recording of the logs and visibility of who used what from a data point of view to determine, hey, I built this information product and now you’re not using it. What’s wrong? What do you want us to fix? Or, Hey, we got all this data and we brought it in, Shane brought it in and nobody’s using it,
therefore we can now have a conversation around waste. , that should just be default behavior for the teams. I’m with you. And I love that idea of, quality circle, more about it. I love the pattern of by a team , creating. A list of things they know are wrong that need to be iterated, just in a spreadsheet, and then saying we’re just going to fix one of them.
That’s where the retrospective from Scrum is meant to take us, ? It’s meant to say let’s document the things we know aren’t working well. And then let’s pick one, two, or three and fix them next time, fix our process. Whereas a lot of times the team’s just focused on operating the process, not improving it.
And then the MPS one, , in hindsight, the feedback you get as a data team that’s not adding value is when there is a downturn, there is no longer a data team not adding value.
Chris: You’re
Shane: Potentially, potentially an NPS score a year or two ago might’ve helped some data teams
Chris: wrong grain of feedback, you know. Don’t have a year grain or a decade grain.
Shane: no, but you know, it’s the same as people just unsubscribe to your product because they’re not finding it valuable
That’s the same as a data team, no longer having a job. A bunch of patents that are out there. I think the core is. When you talk about DataOps, you’re talking about more than just taking technical DevOps principles and applying it to data teams,
you are talking about, team design the team’s process, the way they work, the discovery of what needs to be fixed as a problem in the organization with data, all the way through to then building it, to then . Delivering it, deploying, enhancing it, and then checking that actually it is solving those problems or what you need to do.
So you’re effectively picking up patterns from agile, patterns from product, patterns from lean patterns from data patterns from software, and then you’re mashing those up into this DataOps way of working
Chris: I’m not talking about what database you use or whether you use, Airflow or Prefect or what reporting tool you use whether you use DataFault or not, or OneBigTable. Those are good decisions, but none of that stuff is actually matters All of the debates that you have that go on in data and analytics aren’t really that important when you realize that 70, maybe 90 percent of the team’s time is waste because they’re building things or doing things that aren’t delivering value. And when you take that perspective that we are just wasting a lot of time and then you really start to rethink what you should be doing as a leader to remove that waste. And so the biggest thing about data ops is you’re just wasting time. And, , two thirds or more of your team’s time is wasted.
Your time’s wasted. So what are you doing in life? Why do you even want to be in this career? You go to career day at your kid’s school and recommend, Hey, two thirds of my time, it’s just, I’m not doing crap. I’m working really hard, but two thirds of it’s just, it’s just going down the toilet. Are you going to tell , , your wife when you get home? Yeah. Two thirds of my time, it’s just crap. I’m not doing anything. It erodes your soul and it’s not a good feeling and you can kind of deny it and not look at it. But I think if anything, data ops, it’s just look at the amount of waste your team has and the amount of stuff that’s not working and the way that you evaluate waste is, is your customer actually getting value from it? It’s really clear to do that, ? Not that it’s done or that it’s shipped or that it’s live. It’s that it’s used. And that they, need it. Those are really hard metrics , to get in your head. But if you back up from that, then everything else gets clear. That’s all we really care about is usage , and, liking and value.
Shane: I wonder if that’s a better definition of data ops, it’s the focus on reduction of waste and data teams or data process. And then I come back to your point that there are no DORA metrics for data teams.
that’s probably one of the problems is people don’t know what good looks like. And then I start thinking, , what would I do?
One of the ones I’ve already talked about is I would instrument who uses the information products we’ve delivered because there’s, there’s a metric of waste, the team spent three months, even if they spent two weeks, building this thing, and nobody uses that thing. That was waste. I know a number of people that complain about the number of meetings they have because they see them as waste.
Well, if they’re waste, stop doing them.
Like you’re sitting there going, I do this thing that is a complete waste of my time. Well, then don’t do it because you’ve identified it’s a waste.
Maybe it’s a Dora metric, number of hours spent in meetings. And then you’ve got to be careful about collaboration versus a meeting,
and how you define that. Cycle time, from request to. Deployment, but then the entire cycle time, not just the, you’ve got the requirements and you’re starting the build from the date, the stakeholder said, I had a problem I need to solve with data to the date you deliver some data that may solve that problem.
What is it? Because what I see is a lot of requirements gathering that then goes into JIRA Ticket Hill as a backlog and then gets, prioritized in a couple of months time. The data team are really busy when they get that request onto their plate. But if you’re the stakeholder, you’re counting from the day you said, Hey, I’ve got a problem and somebody engaged to understand that problem.
That’s when the time started ticking. That’s the beginning of your cycle time, , not when the data team started working on it. So there, I think that could be , your next goal really, is to
create a set of example Dora metrics for DataOps waste. Hard. Yeah.
Chris: about the four metrics that matter, and I think error rate and cycle time are one of them. I also think this idea of, utilization and value, or maybe the cross product, because you can have something that’s poorly utilized, but incredibly high value, and it’s like the CEO’s report.
He looks at it once a week, but that’s really important, it doesn’t have much utilization, but yeah, it’s got value, it’s a connection. You got something that has value. Very little value, but a lot of people look at it because it’s a tracking metric for them. Okay, it’s the multiplication of two and then it really comes down to productivity how much time is on task Maybe you could you subtract out meetings you subtract out waiting. How much Your creative people are on keyboard coding creating value we used to have a CDO council and the, people would look at it and they’d send out spreadsheets and they’d go, God, my teams have three hours a week, literally three hours a week when they’re on keyboard creating value.
The rest of it is coordinating, collaborating meetings, and this is fixing stuff that should have been working in the first place. Answering customer questions, all this stuff. We are creative people who want to build. And so the time that you’re actually building important new things, that’s a really important metric and that’s the big driver and we don’t measure what productivity means for our team because I don’t think we want to know, all I know is if your customers trust your data, that means you have very low errors. If you can respond to their requests quickly. And get something in their hands that’s 70 percent right. That means your cycle time’s fast. If your team is not spending a lot of time in meetings and they’re actually trying and they’ve got time to experiment and refactor, their hands are on the keyboard a lot.
That’s productivity. And then the last thing is really usage. You’ve got usage and the combination of usage and value. And maybe we have a study, but it’s very rare that I see a team who tracks any level of those metrics. And I think there’s a whole industry around Dora metrics and software.
There’s Dora software companies. I think from my standpoint, I’d love to build , the Dora metrics for data ops, because I think it’s really important. I’ve thought about maybe teams would pay to do it. I just, I don’t know. When I’ve looked at it, like we did a consulting engagement with a big pharma company and all I wanted to do was find error rates I wanted to find error rates in CycleTime at the time. No one had it. No one knew it. No one would fess up to it. It’s like, we don’t know. You don’t know how often, , how do you not know this? This is one of the top pharma companies in the entire world. And , they just didn’t know, or maybe someone knew and they didn’t know how to find out, or it was.
It’s, it’s a lot of, it’s sort of a hidden in operations team somewhere in some other country , so the barriers to finding this information is hard, I think as a leader, running your team on metrics, identifying what those metrics are and you do get what you measure, .
The DORA metrics aren’t perfect, but if you can change them and focus That’s what you’re trying to get your business customers to do, drive their business on metrics, yet you don’t have them. I think that’s a bit of hypocrisy from leaders, and I think , I would challenge them to start doing it because you can do it really simply.
You can do it with, just email me last month, how much time you spent coding and how much time you spent in Git.
Shane: It’s about balance, so we don’t want to go down to that horrible one where, teams are encouraged to do nothing but code because the collaboration and the planning is important but it’s about balance. And yeah, I think you’re right. This idea of data teams ain’t data driven, it’s like that, that amazing chef.
Who’s in a kitchen every night and then goes to McDonald’s every day to eat themselves, wow, that’s kind of an oxymoron,
Chris: what’s wrong with you? I mean, you don’t even cook at all. Like don’t you can’t, you know, some quick meals.
Shane: We should probably start doing the things we tell everybody else to do ourselves.
Chris: , I go back in my own, I, I get negative and positive, right? Cause I do see people are, I think the adoption of code, the adoption of observability and data quality, I think those are driving functions. I think the adoption of cloud is helpful because a lot of those systems are built with CINCD and Git built in.
Most of those tools that people use to orchestrate, a lot more people are doing things in code rather than in these sort of black box systems where your ETL tool, it’s in their code, it’s in their format, you don’t understand it, it’s sometimes binary or sometimes in tables or in XML it becomes more public and readable.
And I think there are companies now there’s three or four that are involved in sort of code driven BI. Or BI as code, that’s their, you go to their website, BI as code. Or , headless code driven BI. Good datas on that. So I think there’s good. And then there’s companies like Kabool that are basically doing environment management.
They’re handling all the DevOps in the, , I want to script my environment up with my tools. I want to tear it down. And you can buy that now, you got your five things in your data platform. And somebody solved that problem for you. So if you look at the critical data ops capabilities of environment management with test data, , there’s test automation.
Now people will write your data quality tests for you. My company does that. There are observability tools. We have two open source tools to do that. There there’s version control that’s built into a lot of your teams. I think there’s a lot of bits and pieces that people are putting together. The biggest thing I have not seen in overall. Sort of value stream metrics, Dora metrics tool for data teams. I don’t know, it’s part of me thinks there’s an entrepreneurship opportunity, but part of me also just says, could you just use Excel? Just track errors. Just ask some simple questions, do a net promoter score an email.
You can do this really cheaply yourself as a leader. And it’s once a month at once a quarter. I think it would really be interesting. For you to reflect on just those, those four things, how much time people are spent on task, how many errors you have, are people really getting value? And then you can go to the next level of which report is actually being used or not, maybe we should decommission it, which table, which data set isn’t providing value, where are all the errors in this, maybe the locus is that. Provider who’s giving you bad tomatoes, and they’re the one who’s causing the meals to be bad , and you should replace them. Because sometimes they don’t even know who the bad suppliers are, who’s late. It is not reported on, it’s not visible. And so making another way to think of data ops is making the invisible part visible, making a lot of those work processes owned and visible and iterate upon able and changeable.
Shane: The go to a different supplier for tomatoes. I have seen that, ? I’ve seen organizations where they had struggled to get the good data out of their source system. And the data team have been highlighting that, but so has the rest of the organization. The operations team, the front line staff go, that system’s shit.
It doesn’t give me what I need. I can’t use it. And so then they replace that source system with something else that’s better because they’re actually solving a bunch of problems. And then I come back to, with the downturn. If I kind of look at it and I go, yeah, the problem we had was instead of the leaders of an organization going to the data team and saying, you’re valuable, how can you optimize your factory, , to reduce the cost of what you produce or increase what you produce without increasing the cost?
All they did was shut the factory down. And that’s telling, there is no value in you as a factory. We’re just going to turn it off, we’re not even going to have an alternative. Well, alternative is Excel. And I think that’s where we kind of go, we haven’t got there yet, but I’m with you.
We’re starting to see a lot of change and. That’s what we saw in the software engineering world, we saw all the same problems we had around, low quality code, no automation products that consistently broke, high levels of waste , too much time spent up front getting requirements and then, not delivering anything for three years.
I remember the software projects that were, Big and ugly like those with all those problems and our software brethren seem to have been ahead of the curve compared to us in bringing patterns to fix that. So I think we’ve started that journey. The question is, how fast is it going to go?
Chris: Yeah, I don’t know why I really had thought it would be faster, and I thought there’s something about software engineers that are arrogant and A little bit anal and , they don’t like waste. And it bugs them. And it really bugs a lot of people to do things twice,
and to do things three or four times. Cause they know really deep down in their soul that it’s going to cause them problems. And data teams aren’t like that. They’re like, oh, I’m just going to copy this code and tweak it a bit. And I’m going to copy this code and tweak it a bit. And they don’t feel the same way.
And maybe because , they’ve been beaten down more and , they don’t feel that they’re arrogant. I’m not sure what, what it is, but , I would like data teams to be a little bit more empowered and, and prissy and saying, look, we have to fix this thing. We have to own our process.
I refuse to work this way and, I’m going to quit. And
I think that would be a good thing for some organizations. If the people who actually know where all the bodies lie and are vital to doing something, , they threaten themselves to quit. That would be a good wake up call for some organizations. Not that they quit because they’re overworked, but they quit because they’re like, we’re not improving our process.
Shane: yeah. And they go work for somebody that wants
that, that value. Yeah, I’ve thought about this a lot and the thing I come back to, And it’s a small thing, but I think it is the thing is when you’re a software engineer, you’re in charge of the entire product especially with the DevOps community.
Now you’re in charge of the infrastructure, but more importantly, you’re in charge of how it’s used, what it captures and what it delivers. The end to end cycle of beginning to end is you or your team. In data, we get given the data and somehow we use that as an excuse that we’re no longer accountable for the final product.
And I come back to that’s the core anti pattern. That seems to be the problem. And then a lot of the successful data teams I’ve seen, it’s because they pushed back and had the original data problems fixed, which made their life easier, but that doesn’t happen very often. And so maybe things like data contracts will be, the change for this problem that we saw DBT change for version control.
Chris: Shane, I, really appreciate that because I do think data quality is a version of the tragedy of the commons and data, the source data system is often crap. And So we built this engine that scans data and does a whole bunch of data quality checks on an automatic. And we were originally thinking of it from data engineers. Like I want to put it in my process and auto build tasks because it saves the data engine time. And then we did some market research. We talked to about 35 data quality people this summer. And what we realized is there are people in the organization who really want to improve the data. However, the people who own the source systems, the data is often good enough for them. And they don’t have a reason to. However, organizations want the data that that team owns to be improved. So the challenge is how do you get data quality people to have influence when they have no power? And so I think what we’ve done is we spent millions of dollars on our open source project. It has a nice UI. It’s complete. It scans your data. It builds a whole different variety of data quality scores, not just data categories. But what we realized is that like any influence position, when you have no power, you’ve got to be very selective. about what you try to influence. And don’t say all your 10 tables are crap. Like don’t be wasteful like that. Pick the three columns that are critical data elements for your bank. Pick the 10 columns that this important model for your data scientists are using. Pick the five columns that really drive all the reports for your quarterly goals. And just work on those. Right. And have a tool where you can create a scorecard from them that can monitor them and build a whole bunch of data quality rules automatically, and then give a package to the people who fix them so they can understand what the problem is. And so if you can do that, we’re hoping that that’ll be the linchpin.
And I really like your idea. Cause if you can actually fix data quality, then people will start, they don’t have anyone to blame anymore. And they can’t say it’s that person’s fault. And then it actually really. Is your fault, and then you can feel that way. And then you can start working on your process. So I like that idea. It’s my hope as well, that those teams themselves, they can become a lot of data quality projects are not agile. They’re very waterfall requirements driven. If I just need to look at my data, I’m going to write that. And I don’t think you need to do that. I think you can scan your data.
Our tool within an hour, it gives you, Over 50 different data quality checks automatically run, automatically size. And you can start there, turn some things off, add some things that are domain specific and work from that , and give that list to people to fix, to push back. And if you are smart about it and don’t do a dame of data qualities, category of 10, 000 tables and say, you’ve got to go fix your fax number, it isn’t right. And like, who cares about fax number? Nobody uses the fax number, pick the attributes that just provide business leverage. Do that, show that they improve, pick some more, show that they improve. And then lo and behold, within a year, you’ll have much better data quality functionally for your organization. Then you’re right, maybe data teams will start to say, look, I can’t blame source data quality anymore. It’s really my fault. And maybe that’ll be the linchpin to move forward.
That’s what we’re hoping is data quality is the linchpin to start agility. So.
Shane: , if I think back, I’m old enough to remember, Lean, Six Sigma, Total Quality Management, those kind of patterns that helped change organizations because they focused on quality as being one of the core things they focused on. And maybe that’s what the change needs to be, is that actually quality of data is a focus, not just a bunch of governance stewards sitting there complaining about how bad it is and doing stock takes.
Chris: Maybe it actually has to become one of the key drivers of the organization,
Because it’s good enough for the team that owns the data, but it’s not good enough for the teams that use the data. And if you make the rules of what they need to do, the people who own it, if you can make it clear to them what they need to do, they will fix it. Right. And if their boss’s boss says, you got to do this, they’ll fix it because it’s making what’s invisible, the use of my data by other people, the tragedy of commons, it’s making it visible through a very specific dashboard, a very set of specific actionable recommendations. And , that visibility, I think will enable people to stop. Like you said, being data quality people that were kind of, they’re fun to talk to. They’re interesting people, but they’re kind of bitter because no one listens to them.. And, we’re trying to give them a nice specific bat or they can kind of knock some people on the heads about the attributes that matter. And we did talk to organizations who were very successful and they were successful on specifics, like one bank who very much focused on their critical data elements that were for reporting to the government. They didn’t fix all their data. They just fixed some of their data. And because it had to be done because they could get busted as a bank if they didn’t do it. And not all banks do that, but that’s a really good reason. And lo and behold, those elements then can be used for lots of other things because they’re of high quality now. And then they said they’re starting to do more elements that aren’t related to compliance reports because they’re related to revenue generating activities.
Shane: I’m just going to close this one out now. So you do a lot of writing, create a lot of good content, and you have open sourced some of the cool products that you’ve been working on for a long time. So if people want to find you, read what you’re writing, find these tools, where do they go?
Chris: They go to data kitchen.io and you can get a link to our open source. You should be able to play it out. We’ve got demos. You can try it right away within an hour. We’ve got a great blog. We have two books on DataOps Cookbook , and recipes for DataOps Success. We have. Two training programs that we’ve had about 4, 000 people do our online DataOps training.
We’ve had about 1, 000 people do our data quality and data observability certification that we developed in the last year. I’m trying to write , once every week or two about these problems. , I’m 61 now. I’m not going to be working forever. But I’m still hopeful , there are days I feel like taking my toys and going home because people are doing some, some stuff. It’s like, I’m not going to play with you anymore. Today isn’t that day. I do feel like keeping my toys on the table and I appreciate talking to you, Shane, you make me feel good about having my toys on the table and not, not going home and pouting.
Shane: I think there’s a bunch of us that have been frustrated for many decades, and we’ve tried many things to solve some of the problems, and some of them have been solved, and a lot of them haven’t, so it’s
not easy, but
keep
fighting the good fight.
Chris: it’s better to be in the arena trying than, than on the sidelines with your toys to yourself.
Shane: I agree. Excellent. Hey, look, thank you for the time. That has been excellent, and I hope everybody has a simply magical day.