How To Bring Agile Practices To Your Data Projects

02 Dec 2022 | AgileData Way of Working, Blog

TD:LR

Late in 2022 I was lucky enough to talk to Tobias Macey on the Data Engineering podcast about combining agile patterns and practises with those from the data domain.  Listen to the episode or read the transcript.

 

Shane Gibson - AgileData.io

Listen

Read

Summary

Agile methodologies have been adopted by a majority of teams for building software applications. Applying those same practices to data can prove challenging due to the number of systems that need to be included to implement a complete feature. In this episode Shane Gibson shares practical advice and insights from his years of experience as a consultant and engineer working in data about how to adopt agile principles in your data work so that you can move faster and provide more value to the business, while building systems that are maintainable and adaptable.

 

PODCAST INTRO: Hello and welcome to the “Data Engineering” podcast, the show about modern data management. Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo unleash its transformative potential with that lens active metadata capabilities. Push information about data freshness and quality to your business intelligence automatically scale up and down your warehouse based on usage patterns and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today, that’s A-T-L-A-N. To learn more about how Atlan active metadata platform is helping pioneering data teams like postman plaid, we work in Unilever achieve extraordinary things with metadata. When you’re ready to build your next pipeline or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it. So check out our friends at Linode. With their new managed database service, you can launch a production ready MySQL, Postgres or MongoDB cluster in minutes with automated backup’s 40 gigabit connections from your application hosts and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show.

 

Tobias Macey: Your host is Tobias Macey. And today I’m interviewing Shane Gibson about how to bring agile practices to your data management workflows. So Shane, can you start by introducing yourself?

Shane Gibson: Hi, I’m Shane Gibson currently the co-founder and chief product officer of a startup called agiledata.io. And my background has been for a while coaching data and analytics teams, how they bring agile patterns and make them useful. Just like to say I’m a longtime listener, first time caller. So love the podcast and great to be honest.

Tobias Macey: Happy to have you here. And do you remember how you first got started working in data?

Shane Gibson: So it’s kind of embarrassing. I’ve been in the data world for almost 32 years. I think as I thought about it, and I started my career out working in a finance department doing accounts payable, so doing some of the invoice processing. And I didn’t enjoy it, I found the work quite mundane. And I was lucky enough to fall into a role around the thing we called systems accounting. So at the time, we had a time timesheet system. And so I picked up the technology side of that. And just to date myself, I think the first server we bought was a Compaq 386 SX/25. So those cost us about 50,000, New Zealand back then, and I think 512k of RAM. So really started enjoying that working in the technology space, I was lucky enough to move to another organization, where I was put in charge of replacing the financial system moving off the old mainframe across to cloud server. And as part of that, we were experimenting with what was called executive information systems back then EIS, and playing around with toggled forests and trees to take that financial information and make it available to our CFO. So after we implemented that platform, I moved jumped across to the BI vendor world. So who are some of the large US vendors, but based out of New Zealand. My first gig was kind of doing case enterprise resource planning and that financial software market. And again, it wasn’t my passion, but I was lucky enough at the time that the vendor had a really big data stack. So I jumped swiftly across into that team, and spent, , 10 to 15 years working for those large vendors based out of New Zealand, and that data and analytics space. After that, I’d founded a consulting company, Optimal BI. And so grew that to usual type of data and analytics consulting company, I think, when I was leading it got to around about 20 consultants. So we’d go into a customer and do their strategy work and figure out where they wanted to go and then bring the rest of the team in to help deliver it. And as part of that, I got frustrated with the standard consulting methodologies. I found that they didn’t work. And so I stumbled across this thing called “Agile”, which at the time for me was, I thought, a bit of a hairy theory, weird religion, the whole kumbaya thing. So we experimented with it internally with the team. And it was lucky enough to experiment with customers with their teams to see how it work, and spent the last eight years effectively being an AgileData coach. So working with data and analytics teams exclusively about how they can adopt agile practices. And then to finish it off about three and a half years ago, I co-founded agile data.io, with a good colleague of mine, Nigel, and so been kind of leading that, that product side of that company for the last three and a half years.

Tobias Macey: And so in terms of the overall practice of bringing some of these Agile principles into the Data Domain, I’m wondering if we can start by talking about some of the industry verticals and existing patterns that you’re coming into as you start to work with some of these businesses and say, this is what’s challenging for you now. These are some other ways that agile can solve that problem for you.

Shane Gibson: If I look at it, everybody has a data problem, whether you’re a small organization, or a big organization, whether you’re in financial services vertical, whether you’re in an insurance, a government, vertical health, everybody has a bunch of data. And they want to leverage that data to make decisions. So from my point of view, I don’t tend to work in specific verticals when I’m coaching teams. But what I do see is I see these, basically two times that an organization will engage me as a coach. One, which is ideal is when they’re starting their journey, they got a way of working at the moment, that’s not working for them and that’s the key. There what we’re doing now is not working for us, we want to make a change. And they’ve seen this agile way of working or talk to people have done it before or read about it. And they want to try it to see if it makes the world a better place in terms of their organization and the teams. So what we’ll do is we’ll start from scratch, the team will start their journey from that point and I will coach and help them move forward. And the other way that tends to happen is the team have been experimenting with Agile for a while, normally 6-12 months, they’ve either had no success from the goals that they wants to achieve, or they’ve started off and it’s been working for them. And then I kind of hit a wall, rather get stuck around iterating the process. And that’s the other time that I tend to get bored and to help teams change the way they work.

Tobias Macey: As far as the kind of paradigms that have been dominant in the data ecosystem. There has particularly been in recent years a number of attempts to bring some kind of software methodologies into the data domain with varying levels of success with some of the notable standouts being things like data ops, and version control, both for the code and data versioning, auditability testing kind of observability. And I’m wondering what you see as some of the aspects of agile that do map well into some of the existing data paradigms and some of the ways that it either falls short or introduces maybe brittleness, or kind of extra work that is just not really providing value?

Shane Gibson: So the way I think about it now is when we talk about agile, it’s a way of it’s a form of mindset. And one of the downsides is a lot of people think agile equals Scrum, and that’s not true. There’s a bunch of patterns out of lean, out of flow, out of XP, out of some of the other agile ways of working that have value to a data team. I think the second thing is people often focus on one perspective of their job. So they’ll focus on something like technical patterns, or you’ve talked about CICD version control, those kinds of things. And the way I think about it now is I think these four lenses we can use and I talked about pattern. And the pattern is something that has value in a certain situation. It’s a something has been used before, that if I look at the context of the way it was used, it fits our context. And if we apply it, we potentially will get some value out of it. And I break those pattern down into four groups. So I talked about team topology organization pattern. So this is the way the teams are structured. Do we have one single team? Do we have multiple teams? How do they interact? How do they fit into the organization? So that whole pattern around team topologies is important? The second thing I talk about is process or practice patterns? What are the things that team they’re going to do to get data from the beginning to the end? I’m a great fan of a concept called “How to make toasts?” If you Google that, and look at the YouTube video, it’s a great process to work with a team to say, actually, what is the work they do every day? And how does it flow? The third one I think about is technical patterns. It’s those idea of version control, those idea of checking of managing tests before you deploy that idea of data modeling. And when the data world we have some unique things about that, which we can talk about in a second. And then the last one I think about is way of working patterns, how do we take all those other things and put it together and create our own way of working. And with teams, I really encourage them not to adopt a methodology. So there’s a big push in the world to adopt Scaled Agile safe, and I am very negative in my view of safe. Agile is not a methodology. It’s a way of working which says, we’re going to iterate, we’re going to get value to the customer early, we’re going to get feedback. And then when something’s not working, we’re going to change it. And so we don’t want to pick up a methodology, we want to craft our own way of working but by leveraging patterns that have value, so we don’t have to do it from scratch.

Tobias Macey: One of the interesting aspects of kind of the Agile principles in the data world is that one of the predominant aspects of Agile is that you want to focus on a fully connected end to end flow with a very narrow scope where you say, I want to, for instance, add a new input form for somebody to be able to give me their email. So that means I have to have the UI. I have to implement that. I have to create the database model so that it can store the field. I have to make sure that the controller, middleware, and the web application is able to receive that input and write it to the database. I need to have tests around all of that, and in web application workflow that’s fairly straightforward, it’s well understood how to actually do that end to end flow with that narrow scope. Whereas in the Data Domain, it’s not always clear how to manage the appropriate chunking. Because before you deliver all the information to the end user, you have to think about things like governance, data modeling, and the lifecycle of the data cleanliness. And so I’m curious how you think about how to approach that question of what does that “narrow” slice look like? And how do we reduce the scope from end to end without starting at the beginning and saying, now I actually have to do the entire horizontal layer of staging the raw data across the board before I can even go to whatever the next stages are. So you’re able to do that kind of like deep integration instead of wide integration.

Shane Gibson: I think there’s two questions on the firearm packet, there’s the question of why do the software engineering practices that are well established and really successful seem to struggle when we apply them to the data domain. And then the second question is, how do we thin slice? How do we take this big behemoth that we typically used to spend three years building out? And how do we bring it down into weeks and do that in a successful way? So if I go back to that first one, I’ve really struggled to understand why software engineering practices and agile from a software engineering point of view is difficult in the Data Domain, because in theory, it’s very similar. The best I’ve got at the moment, as you said, when you’re building a web application, you’re in control of the data, you control how that data is created, how it’s entered, how it’s landed, how it’s stored. And the data domain we effectively get given that data as exhaust. So we have absolutely no control and we get a massive amount of uncertainty. And that brings a lot of the problems to the data world that we don’t have in the software engineering world. I think the second part is the tools that we have in the data world for adopting an agile way of working, we’re in the Stone Age. The tools we have are not fit for purpose. They’re based around big chunks of work happening. And so we’ve got to find ways of fixing those two problems and we started to see that. We’re starting to see our tooling get more agile in terms of the way we work. And we’ve found techniques to break and solve that problem around that uncertainty, we’ll never solve the uncertainty unless we do go full data mesh where the software engineers are actually producing for people’s data. And that’s a dream we’ve had for 30 years. I don’t think we’re going to achieve it. But we might just look at it. And why don’t I think we’re going to achieve it? Because if I’m the product owner for an organization, and I need a new field on my form to go out to engage with my customer, and I have to make a trade off decision before that field turning up to this week, or are you doing the data work for that field to turn up next week. I really am a data driven organization, I’m going to make a trade off decision which is push that field out and give me that customer value. And then come and do the data, and then I’ll potentially reprioritize some other work. So I think it’s an organization priority problem more than anything else. So we get given this data we can’t control, it has a massive amount of uncertainty are tools that fit for purpose, what techniques do we have to fix that. And so what we want to look at as thin slicing. So teams will ever take one or two approach, they’ll thin slice, which they’ll try and break the work down into a small enough chunk that they can do end to end in three weeks, or they’ll pipeline it. They will break up their work to match the technology stack. And they’ll pipeline the work. So you’ll see them, grab the data collect it landed on staging lake, whatever want to call it, then they’ll see them pick it up and move it into some form of other data repository and ideally model it, then they’ll actually go and create some metrics. And then they’ll create some visualization or last mile delivery. And each one of those will be set to work with handoffs and milestones in between. I encourage teams to thin slice. Our goal is to get a group of people to be able to go into end with that data, add the value to that customer, the consumer at the end of what they do. And ideally, I stopped where we try and get the teams to do that within a three week iteration which is hard. It’s hard to go into end and three weeks as a small team.

Tobias Macey: Another aspect of that kind of end to end is understanding what is the other end where in the web application, it’s very clear that the other end is some UI functionality or some way that the user is able to interact with the product. Whereas with data, there isn’t necessarily a kind of cohesive end step for any given piece of effort. Where in some cases, it might be, I needed to be able to add a new filter to this visualization in the BI dashboard, or I need to be able to populate this data source for any API so that I can then consume that data product in some other web application or data pipeline. So what do you see as some of the most common or most achievable kind of terminal nodes in that end to end graph particularly for teams that are first starting on this effort of being able to create a transform into that agile workflow?

Shane Gibson: So this comes back to when those process patterns, and it’s around understanding requirements, understanding the value we need to deliver. So I’ve been working on a thing called the information product and the information product canvas with customers for the last 8-10 years, and open source or canvas, so we can have a link in the show notes, if anybody wants to download it. And what it’s a way of defining a boundary of the work to be done. So we go away, and we talk to the stakeholders, and we want to understand the actions and the outcomes that will be taken with this piece of effort. So if you get this information at the end, what are you going to do with it? What action are you going to take? And what’s the outcome of them? So are we going to go and look for get a flag for customers that are about to tune? What are you going to do with it? We’re going to have an outbound call seem to go and talk to them and give them an offer. And if that’s successful, what happens? We reduce our churn, which has some financial benefit. So what we’re doing there is we’re getting both the action and an understanding of the value of that piece of work. And the reason we want the action is often we get told the solution, not the problem we want to solve. And as data specialists, we should be trying to understand the problem and ways we can solve it. So by asking for the action, we can sometimes come up with a different solution. Say, well, actually, we don’t need a tune flag, we’ve already got this thing over here. If we just leverage that we can get something to go much quicker that will solve your problem, we’ll give you that action. The other thing we do is we often ask for the business questions. And the reason we do that is we find when we ask for the action in the outcome, some people struggle, they haven’t thought about it that way. But if we asked for the business questions they want to answer how many, how much, how long? It comes straight off, they always have three to five straight out of the head. And from there, we can help them feel the action and talk to them about it. As part of the canvas, we want to understand the data that we need. So we use a methodology from Lawrence Cool, Cool Beam, who does what? And we say, what core business processes are involved. So customer orders, product, customer returns product to store. And that enables us as a team to go actually customer orders product, we’ve already moved that data into the platform, we can get something out pretty quick customer returns product to store. Actually, that’s a whole new system. We haven’t collected that data. And that allows us to size the data we got. Now we’ve got a data collection task. And we know those are hard. So the Canvas allows us to have a bunch of boxes gather that information, understand how big it is, and then go back to the stakeholders and have a tradeoff decision, conversation say to them. Well, we could do everything that you want and we’re estimating three months, or we could do this, but first that’ll be two weeks. And then we’ll do the next two weeks and we’ll just incrementally build it up. And that’s what we want. We want to be able to break those requirements down into smaller and smaller chunks to get it out show value and get feedback.

Tobias Macey: particularly on that requirements gathering piece and helping to educate the stakeholders on what is easy, what is hard, what is impossible with a web application that can be challenging enough of somebody saying, I want it to automatically know whether this is somebody that we’ve interacted with before. And it’s like, we can do that but that’s going to take about two years. And then in the Data Domain, I want to be able to report on X, Y and Z. And you say, actually, we’re gonna have to incorporate information from five different sources and do entity resolution. And now we’re talking another two years for this kind of thing versus, I just want to be able to know did they purchase, or did they have something in their cart? And you say, we can do that pretty quickly. And just understanding both at the stakeholder level but also sometimes within the data team, what are those kind of relative levels of effort? And what are the things that are easy versus hard versus impossible?

Shane Gibson: So in the past, I’ve made many mistakes. We fail often in Agile, and one of the ones was trying to explain the complexity to our stakeholders, they don’t care. They just want the job done, that they’re not specialists in the data world, they have something they want, they don’t understand why it takes so long and that’s reasonable. It’s not the domain. And often we have complexity that doesn’t make sense to us. So how the hell do we justify why it takes so long to them. So this is where the role of product manager or product owner comes in in my view. And that is a role that sits between the team and those stakeholders. And that role is around facilitation and communication with the stakeholders around tradeoffs that need to be made. And then it’s also about making the tradeoff decisions. So when there is something that is complex, and it is going to take some while and there is no choice, then the conversation then from a product owner, the stakeholders as we can do it, and this is how long it’s gonna take you on a wait, or here’s some alternatives that will get us there and incremental ways but you won’t get everything you want. And what we see is when we have a good data team and a good product owner, there’s a mess of trust there. When the team are estimating or guesstimating as I call it, because as humans we are crap at estimating. We think estimating the number, there’s a trust thing where the product owner has been working on for a while and goes, that’s nuts for the little I know. And then they articulate to the stakeholders, there really is no alternative. That’s just as what it’s going to take to get their job done.

Tobias Macey: The other interesting element of kind of doing that end to end flow is the question of data modeling, where how much of the modeling and kind of entity design and schema design do you do upfront versus how much of it is emergent. And when you do the path of saying, we’ll just do it. And then we’ll let the patterns become emergent similar to how you might do in a regular software project of write it a couple of times, figure out what are the abstractions, refactor, doing that in the data world isn’t always easy or possible, or it just becomes expensive, because you’re duplicating data or the operations to refactor the data in kind of rebuild it from scratch can take quite a bit of time. And wondering what you see as some of the signals for when you need to bias towards one direction or the other, if we don’t do the data modeling right now, we’re going to regret it because it’s going to take weeks or months of effort to do it afterwards. Or, this is a small enough change that or a small enough problem space that we can just do whatever makes sense now, and then refactor it later?

Shane Gibson: So I’m highly opinionated on this one. And we’re highly opinionated, on our AgileData product as well. I don’t believe there is ever a one off question, ad-hoc piece of work. If you look at what happens, you get asked by a stakeholder for a piece of data or a piece of information to answer a business question. That’s the first business question. How many customers we got? See, as you answer that, you’re going to get the next business question. , Where are they based? What are they buy? How much are they worth? How many we’re losing? Soon as you answer those, you’re going to get the next level of complex questions. Why are they leaving? What can we sell them that’s going to give us more revenue? But wonder more customers that look like them? What do they look like? Then we start getting into the really complex optimization and flow ones over time, how many have left from this region versus that region? Why did they leave? And what could we have done to change it? So the first question we get asked is just the first question, and we know that the next ones are coming. The other thing we know is, once we give that piece of information to a stakeholder, they’re going to ask for it again on a regular basis that has value for them. So this idea that we ad-hocly just go create a piece of code as a one off, give them the answer, and we’ll never have to use that again. I think it’s been proven to be wrong time and time again. We’ve seen that in the market. We saw it with the self-service BI ways, the tableaus, the clicks, we gave self-service out, lots of people got really cool stuff but we lost the practice of data. And then we kind of go through a way where we move back, and we’re in that way, the game of DBT. Again, I’m opinionated. I don’t like it when words get used out of context. So I really don’t like the fact that DBT calls a chunk of code or model. Because n my patterns after the last 30 years is not a model, it’s a chunk of code. So how do we solve that problem? Well, what we do is we focus on how we model lightly. And how we have modeling that enables change as much as it can. And if you look at people like Scott Ambler, he’s been talking about AgileData modeling for a long, long time. And so with the teams I’ve worked with, there are techniques you can use to model early model lightly and enable change as much as possible. But the thing that’s true with every agile way of working, “Change has a consequence”. So we just try and reduce the consequence of that change as much as we can. But change is always going to need effort. So there’s a bunch of patterns and techniques that I’ve seen teams use to model early model lightly and enable the change to the model. But they always model. And so the other lens that I use is what we call “Definition of done”. So definition done is a set of statements from the team about what the professional practice is, how do they know they’ve done a job for themselves? How do they know if somebody else in the team is doing a piece of work that is done to the level they’ve agreed? As an example, I would expect the definition of done for a data team to have the code has been tested and the data has been validated. Why would I expect that? Well, you go talk to a stakeholder and you say to them, I’m going to give you a count of customer. But actually, I’m never going to check that it’s right, it’s just a number. I can’t prove that’s how many customers we’ve got. I’m gonna leave that to you. Would that stakeholder believe that we’ve done our job? No, but we do it all the time and it’s just wrong. So there is a level of professional practice as data people that we need to do and data modelling is one of those, and testing is one of those and validating our data is one of those, and we should just do it. It shouldn’t be optional. The trick is, how do we make it as accurate? How do we make it small? How do we not go and as I say, a catalog and cocktails for the ocean? How do we not go into that scenario where we have a single data modeler who sits in a small office scratching their chin for nine months to come out with this beautiful Enterprise Data Model we’ll never implement, we’ve got to balance out like modeling but we still have to model?

 

Ad 00:25:38 – 00:26:30

 

Tobias Macey: So in terms of the definition of done, and some of the challenges are kind of anti-patterns that are maybe becoming emergent with this newer set of tooling? What are some of the ways that you see teams as they go through this exercise of we’re going to embrace some of these agile practices, we’re going to do this iteratively. How do they start to think about the tools that they use and the ways that they’re using those tools to be able to kind of incorporate all of these concerns into a holistic kind of development flow without losing their minds?

Shane Gibson: Again, opinionated, what they should do is buy and out of off the shelf platform like agiledata.io, and not try and come up with their own. However, that’s not what the market is doing. We’re at the stage, we have what we call the modern data stack, or I call it the Jenga stack. There’s 25 different tools, one for each category, and you’re cobbling it together. I kind of liken it back to the old ERP days where we used to have to cobble together a payables module and the receivables module and the GL module from three different vendors and we live from that. But given that’s where we’re at with the majority of the teams, I see building out their own technology platform using a combination of open source and closed source. We start focusing on only building what’s valuable for now. But making sure that we understand the technical debt and what the cost of change will be in the future. So we look at creating a blueprint. And what I mean by a blueprint is we draw on a piece of paper on a whiteboard and Amuro board, a bunch of very large boxes that have chunks of stuff that we need. And then we figure out what we’re going to build first. So we’ll typically look at it and we’ll go, we need some way of collecting data. What are we going to use to collect data are we going to build something, we’re going to use the software as a service product, we’re going to use an open source thing. And that’s driven, ideally, by what’s the obvious data source and so we do that. I’ll give you an example. One team I worked with had a theory that the first piece of value we had to deliver was grabbing data out of SAP. And we know that grabbing data out of SAP is incredibly hard. There’s a whole lot of reasons that collecting data from SAP is just a nightmare. So we looked at that, and the team estimated it’s probably three to six months to get a fully automated collection process out of the SAP platform. But when we went and asked the stakeholders, what the first information product we’re going to deliver was, it was actually based around their call center software, because that’s where they had an emerging problem. So if the team had have gone and build out this whole clique, and for six months out of SAP first, and then the first information product, when they were ready was actually the context into software, they’d invested in the wrong place. So that’s what we understand is what we’re going to build first. So they collect normally, and then the second thing is where we’re going to store it, and there’s a bunch of patterns out there. Now, how are we Data Lake centric, are we data warehouse centric, snowflake, are we data bricks, are we Big Query, are we fast or Firebolt, are we single store, there’s a bunch of patterns that are reusable. So we should just pick one, and implemented as quickly as we can and then test it and then we think about what’s next. And so from an AgileData point of view, when teams are working, they should be constantly reviewing how they make toast the processes. They should be figuring out where the next major problem is, or the next piece of values and they should iterate the way they work. So we’re constantly building out our technology platform, and we’re constantly changing the way we work to solve problems as they arrive. But we still need that blueprint. We still need that big picture of where we think we might go and that’s important. But that blueprint is not 6 to 12 months of data architecture or a big 500 Word document that pretends to know what we’re going to build over the next 12 months. Because everything changes, we know that.

Tobias Macey: The other aspect of kind of allowing for refactoring of the data model as you explore more of the problem space is how do you bring in the abstraction layers, so that you can kind of restructure some of the foundational layers so that you can bring in data model reuse, code reuse, concept reuse, without breaking some of those existing end user assets. And doing it in a way that doesn’t add a whole bunch of extra work on the behalf of the developer, so that it becomes a maintenance nightmare where you’re just sort of the situation where if you want to replace the foundation of a house, you first have to lift the entire house and move it, then dig out the whole foundation again, and then hope that it doesn’t fall over in the process, and then relay the foundation and put the house back down.

Shane Gibson: And that’s a really important concept. Is this idea of an architecture of a house or I often use food analogies for some weird reason, I kind of blame data kitchen for that, the idea of ingredients and recipes and stories and frontline service. So we’ve got to understand the blast radius. We have to understand what our architecture looks like, what our layers are the ones we think we’re going to add later, the ones we think become semi immutable, the technical depth for changing them as massive, so that we understand the consequence of that change. And we understand we were making bits that are dangerous. Where those bits are embedded as foundational pieces, and to change them later as a high cost and a high consequence, versus ones that are relatively disposable. So what’s an example, if I were going to bring in a testing framework, I could probably throw great expectations on the side of what I’m doing and it’s relatively replaceable, I could bring in one of those other tools,  and effectively change that testing paradigm, because it’s not embedded, it’s kind of sitting to the side. Whereas if I was going to replace my cloud data analytics database that may or may not be completely replaceable, it’s going to change my modeling technique. If I’m going to go from data vault to dimensional or data vault to activity schema, that’s probably a breaking change, that’s a massive change. So we need to understand the bits we’re making the ones that we can change easily the ones we can’t, and how do we know when we need to change? And I’ll give you an example for our products. Our entire product is based on Google Cloud. So one of the beats we made really early, we’re not multi cloud, we decided to go with a single cloud provider. We picked Google for a whole lot of reasons. It’s actually one of the best bets we’ve made. It’s one of the things I’d probably go back in hindsight and saying that was a good guess. And our products based around configuration, so we’re metadata driven. So when you create a transformation, you actually create a virals for natural language. And that’s stored in a database which holds that configuration. And when the transformation runs effectively, we have code that calls that config says, what does this look like? Compiles the transformation code, runs it and then disposes of it. So we call it a manifest as a pattern. Now, when we started out, we knew that this conflict is core to us and my co-founder, Nigel, and I come from more of a relational background rather than a no SQL background, because we’re that old. So we had a relational model for that conflict, we’d had stuff we’ve done before for customers when we were consulting. So we looked at it and we went, we need a relational storage mechanism, where you want to keep it as low cost as low complexity as possible, we’re using BigQuery as a way of storing our customer data. There’s no reason why we can’t store that conflict in BigQuery. But we knew that was not immutable. We knew that at some stage when we got big enough, and we had enough customers that actually BigQuery wouldn’t be able to handle the concurrency of hitting that conflict database for us, because it’s not designed. It’s a transactional system. So we knew we’re gonna have to change it. So every time we did a design and conflict, we did it with the idea that we were going to move it. Now interesting enough, we decided to become one of the cool kids. And we did a change and we moved to Google Data Store. We moved it to a no SQL database, because when we looked at the market, everybody was doing no sequel. That was an epic failure. Now, we learned a lot but we lost a lot of time doing them. Why? Well, because it wasn’t our natural pattern. Some of the features we got from a relational type database gave us engineering for free, whereas a no SQL, we had to go build. And so what we ended up doing was we ended up jumping from Google Data Store to Google spanner, which is the massively scalable relational style database. And that change, it was effectively we lost all the work we did on no SQL database, we ended up going from the BigQuery pattern straight to Spanner. But now that’s for us as immutable. The cost of change of not using Spanner is massive for us, but that’s okay. We get so much benefit out of that piece of technology, that pattern that actually we don’t want to change it. And so for us, we had a plan. We knew we had to make a change, we had a guess at what we thought it was gonna be. But we always enabled ourselves to paint ourselves out of the corner. And so that’s what teams should be doing. They should be thinking about that about what happens if we need to change this piece of technology. What happens if our massive Cloud Analytics database vendor who has a massive loss every year doesn’t survive, or gets bought out by a big CRM Company? What’s it going to do to us? If we had to change that database, what would we move to? And how much work would it be? So we have to keep those things in mind.

Tobias Macey: I think, another problem that manifests particularly later in the lifecycle of a data product, but it’s something that should be considered early on is the question of access control and governance, and who should be able to view what data and do what with it and export it where, and it’s another one of those things that’s easiest to do early on, or at least to start incorporating early on in the process rather than trying to retrofit it in afterwards. And in this agile way of working of saying, I want to deliver a narrow slice, this can very easily be something that gets left on the cutting room floor of we don’t need that right now so we’ll do it later. And I’m wondering how you either encourage teams to kind of push back against that habit, or ways to simplify the process of incorporating some of those governance concerns or understanding what are the appropriate roll boundaries for data as it traverses its different stages?

Shane Gibson: So there’s two really good examples in there, which is our security policies for data and our governance way of working. I think I’ve got successful patterns for the security. I’ve got great ideas for governance, but never been able to experiment with a team and organization to see if they’re. Well, they were experimenting a little bit within the product we’re building. So if I take their security one, what do we see? We see a natural pattern to complexity, we see a natural pattern of on day one, we want to be able to secure at the certain level, we want to be able to mask certain bits of data for certain users. So it’s a high level of complexity. And if we look at the effort, it could take 6 to 12 months for us to build that out in a way that works. So we have to question about, do we really have to build that right now, or can we chunk it down? Can we start off with an environment where only a certain small number of trusted users can come in and use the data? And there’s a whole lot of belts and braces around policies and procedures and accountability for the people that we don’t need to build a complex security model yet? Then can we chunk it down by role or group and say, we’re just going to segment our Finance Data versus HR data to different groups and make a really simple security model. We might get something like, “No, we’ve got PII data. And we really have to secure your social security number or driver’s license”. But we’re quite lucky in that there’s some technology patterns out there. Now, most of the cloud vendors and most of the good products have data loss protection stuff, where it will go through and use machine learning to identify the columns that have that data and mask it for us. So we could probably put that in as a low cost a low effort component that we can replace later if we needed to. And we just start building up that security capability over time as the need happens. Now, the downside of that is we have to refactor. We have to change. So if we’re using Power BI and Power BI is coming into the data, and it’s using a service account. And it’s not passing through the credentials of the user that’s running the visualization. And we want to bring in some fine grained security. We’ve got a problem now, we have to actually do the work that identity gets passed to the database if the database is applying the security. But we can solve those problems and we only solve them at the right time. If the organization saying no, we need built some braces for whatever reason, maybe we’re in a highly regulated industry, then we estimate it and say it is 6 to 12 months to build out their capability, that’s the cost. So it’s a tradeoff decision. Would you like us to incrementally build it out over time? Would you like us to do a big investment? If we’re doing a big investment, I still think that teams should break it down. Probably they should figure out ways of decomposing that security work into smaller moving parts, which they can test and validate it each step and just build it up like a house, add a room, add a layer. So that’s the security one. And I think there’s lots of ways I’ve seen teams do that well. The governance one is one that I still struggle with. The best I’ve got at the moment is we have to break the anti-pattern of governance as a bunch of committees that take something that’s been done and has an opinion on it, which requires more work to be done. So they’re at the end of the process. What we need is we need governance to be at the front of the process. And on the podcast I do we had a great guest that gave me this kind of terminology that I loved. I talked about exoskeletons and internal skeletons. And the way I think about it now is our governance groups. And by governance groups and might be data governance group and might be an architecture group. It’s a bunch of people outside the team that have the right to set rules, or change the work that’s been done. So let’s look at our architecture group or our data governance group. They should be setting principles that are immutable. And those are exoskeletons. So those are the things you have to do. If you are going to break those, if you’re going to go outside the boundary of their exoskeleton, then you need to have a conversation with those groups before you start any work. You need to trade off with them the work you’re going to do and get agreement that you can ever bypass those rules, do them and partial, or deliver them as a partial capability, or partial compliance, or could be completely out for whatever reason, there’ll be other built some braces put on. So what’s an example of an exoskeleton? How about whenever we store data, that data will be encrypted on disk and in transit? That’s probably an exoskeleton from your architecture or security team. And if we’re going to store data, and we’re going to transmit it, and it’s not encrypted, we need to go have a really big conversation upfront. And we’ll probably get told no. Then we talk about internal skeletons, which are patterns that we can use if they have value. So one might be we have a preference to model data for analytic reasons using Data Vault. It’s an internal pattern. So if you can do that, do it because it makes sense. There’s a bunch of expertise in the organization around that. It’s a well described pattern. We know that if multiple teams create hubs, we could properly conform them together if the keys match. So there’s some value in the reason those patterns. But if for whatever reason, you needed to go use activity schema. Because using event streaming, and it just fits for what the use case that you want to do, then that’s okay. It’s just becomes a new internal pattern. So that’s my view around governance now is we need to be able to understand rules that are immutable, which are our principles patterns, which have value which we should reuse. And then ideally, we should be moving to governance as code, how do we create code that kind of defines that policy, and means we can apply it against what we’re doing? And it can tell us where we pass or fail that will make our life so much easier be able to do that without actually having humans having to check it. So that’s where I met with complex things like security, build it up step by step, if you can, and governance turn it to rules, we can’t break, things we should use, and code that will actually test it for us.

Tobias Macey: Another aspect of kind of the AgileData way of working is the kind of question of how do you incorporate kind of collaboration across the different kind of capabilities in the team where some aspects of data engineering and data management can require substantial knowledge and understanding of the systems that you’re working with the data that you’re working with? , what are the acceptable mutations? And if you’re trying to sort of cross train across members of the team, either who don’t have expertise in the domain of the data source that you’re working with, or don’t have expertise in kind of data engineering writ large, but are quite adept at software? Like, what are some of the ways that you think about being able to kind of bring everybody up to a kind of shared level of understanding and capability and capacity?

Shane Gibson: So we use a pattern called T-shaped skills, which has been around for a long, long time. So when we start off with a new team, and they see you’re experimenting with this way of working, there’s a bunch of foundational patterns we want to put in place. So things like teaming agreements, definition of done definition already given us should have done some things that we just know, we need to be in place first for the team to have a chance of success. And one of those is mapping out the T-skills. So T-skills is based on the concept of you have the letter ‘T’. So across the top of the T is effectively Brits and vertical bar there as debts. So for Brits, what we want to do is map out the core skills, a team needs to be able to deliver data or information from end to end process. So it’ll be things like facilitation skills, requirements gathering, development of code, testing, documentation, release management, those kinds of things, data modeling. So we want to understand the core skills that teams should have. And then we want to understand the depth of capability within the team. So I have a pattern where I talk about novice practitioner, expert and coach. , so novice is somebody who’s done a little bit of it, but doesn’t do it day to day practitioners a day job I’m good at an expert is hey, I can actually teach you how to do it or show you how to do it and a coach is Actually, I could teach and coach and mentor other people to do it the way I did. And there’s quite a big jump in my head between an expert and their coach, because sometimes experts don’t want to coach they just bloody good at what they do and that’s okay. So we put that down the page, and then we get everybody to map out their skills. So from a documentation point of view, am I a practitioner? Am I an expert? Am I a coach? And then once we everybody’s done their T skills, we overlay it as a team, as a group. And when we talk about a team, I subscribe to the pizza philosophy. So between four and nine people as an optimal size for a team for a whole bunch of reasons. So we get those people we put their skills, we overlay them, we see we’ve got gaps. Look, we don’t have any testers. We don’t outsource testing to another team. We want to have testing skills in our team. So what are we going to do? Some people are going to up skill, who are interested in that we’re going to bring another team member in, we want to fill that gap. We also want to look where we have duplication which is good. Where we got two people that are strong at data modeling, where two people are strong at testing or two people are strong at development, because now we’ve got redundancy in the team. So that’s what we want to do is we want to understand that and make sure that the team becomes self-sufficient over time. And the good thing about that actually, is it helps the team have a conversation about where they want to go in their careers, as well. One of the things that Agile does to us, which is not good as it gets us into this factory. It’s funny, when we talk about Scrum, we talk about sprints, it’s not a sprint, it’s a marathon. Once a team’s rocking it, they just end the day in and day out. It’s monotonous. And so we want some way for people to grow, be able to get more skills and the skills matrix or that T-skills thing says, look, I’m actually really interested in jumping over into that facilitation space. Hey, I’m a novice but how do I get to practitioner. So that’s what we do when we have a team of four to nine and then we start to scale. And that’s where we had another problem, because now we go and we have two parts squad’s teams. For the most successful practice I’ve seen as we split our squad into half, cracked two squads bringing new people and cross skill them. Now what happens is you lose what we call Velocity. The team, the two squads no longer deliver as much as the one squad used to but that’s okay. Over time, they’ll build up and we’ll get the one plus one equals three behavior. As we build that up, we’re now going to see specialized skills that actually we don’t have enough of them in the organization to have one person per squad. And that’s where we start bringing in specialized squads and seeing how they work where they help those other squads by either up skilling them, or by doing the work for them. Now, there’s a bunch of patterns out there that help us do that. Spotify published, one got called the Spotify model, we did bad things to it as a community. They won’t share what they’re doing with us anymore, which is a real loss. My favorite one at the moment is one called unfixed dot work. And the reason I like this one is I like the way it’s described. So if you go to unfixed dot work, he has nice pretty pictures with boxes that are colored, and a really good description of what that is. And he talks about teams that actually do the work and teams that coach other people to do the work. We’ll see when we start scaling in the data world, we will typically see a platform team come out. Definitely in the data mesh world. That’s the new hot thing. One of the things I talk about with teams, when we start to get to that level is the platform team is now building a platform or a product for somebody else, not for themselves. So they’ve got to bring in a whole lot of product thinking a whole lot of new skills that are different to the skills when they were embedded in the squad building out their technology but that’s okay. We just map out the T-skills. What do they need as a platform team to build a product, which is effectively what we’re doing with AgileData writers, we’re building a product that other teams data teams can use to do the work. And I’ve seen lots of teams internally do that capability.

Tobias Macey: As you do bring in that split between the data platform and the kind of data engineers or data products. What are some of the useful interfaces for defining what the boundaries are? Where if you’re somebody who’s working the platform layer, and you start to try and get involved in discussions about how should you model your data or what is the right level of granularity for a data product? You can say actually, “No, that’s not your concern”. You don’t need to worry about that, this is the thing that we actually need from you.

Shane Gibson: So that’s the key, is that the platform team are now building a capability for another customer. And they’ve got to decide where they are in their government cycle. Are they setting the exoskeletons, the rules that are immutable? So our platform will only accept modeling using Data Vault, or are they building something that’s internal patterns, which here’s our platform enables Dimensional Modeling Star Schema, Data Vault Activity schema to third normal form. So they have to be really clear about how opinionated the platform they’re building is. The second thing is they need to understand how they’re going to innovate their platform, what’s their way of working? How do they make toast? So an example I’ve seen is with one customer, each of the squads actually did the first cut of platform capability. So what happened was the teams were out building data information products, and they needed a new capability on the platform. And the problem they had was they had a bottleneck. So whenever they needed a new capability, they’d have to telegraph it to the platform team way early, the platform team would have to build it in time for just when the squad needed it, for the squad then to build out what they needed for their stakeholder. And that timing became a nightmare, because often the information product the squad was going to work on got changed. And therefore the platform team were building out features that weren’t needed just yet. So one of the techniques they used was they had good technical people in the squads. So they would do the first cut of the technology patterns, they would use them. And then they would basically be picked up by the platform squad, who would then harden them, and then make them available to the rest of the squad. Because it’s not just about building the technology, it’s about all the supporting things around as documentation training, how do you use it, testing, and all those kinds of things? So that was kind of a prototype iteration and the squads harden and production wise and the platform team. I still have a preference personally for the platform team to have a roadmap and be able to move fast enough that they can move features and out of the roadmap and time for those squads, but their orchestration is often hard. So the key thing is this is your way of working. You have a theory about how you’re going to make toast? How you think it’s going to work, give it a go. When something fails, your retro is let’s look at that process and say, why did it fail? What are we going to experiment with to change the way we work? And how do we know whether it made it better or worse. If that made it better, lock it in. If it made it worse, stop doing it and experiment with something else. So there is no methodology, but there are a bunch of patterns that teams have done that may be useful to them.

 

Ad 00:52:14 – 00:53:40

 

Tobias Macey: The question of accepting failure is an interesting one, because sometimes it’s not always clear whether you have failed or how to decide sort of when to give up. And particularly when you start getting into the sunk cost fallacy of, we’ve already put so much work into it if we just push it a little bit further than it will work. And because of the inherent complexity in the data space, what are some of the useful heuristics that you found for being able to help somebody understand, it’s not going to work, no matter how much more work you put into it, versus you actually just haven’t put enough work into it yet. And that’s why it’s not working.

Shane Gibson: So we’ll see, let’s start with cost. There is a massive problem, and seems to be worse than the data space in the application space. And I don’t know why for sure. But I think part of it is, once those numbers go out, people rely on them. And if we do anything in those numbers change, now we’ve got a real problem. We had a million customers, and now we’ve only got 500,000. Why was that? What was that? Because we changed the rule on what the definition of a customer was that because our code was wrong, was it because the source system did something and affected the number but the numbers now actually correct, how we’ve been reporting incorrect number for a while? So there seems to be a bigger blast impact when we fail with data. Because effectively the decisions or the consequences of the information we’ve used seems to be bigger. But if I go back to kind of the core of the question, the team know when they need to change when they need to iterate. They just need to be given permission to do it. Sometimes on the team, you’ll see one person keep because it’s the baby keep wanting to sunk cost it. Keep going and investing because you’re only 10% away from being ready, they’re 90% done. Well, sometimes it lasts 10% the hard part that’s where all the effort is, or you’re just not going to get there. But team behavior is really interesting. As a team, they will have a culture where they know, and then the team will find ways of stopping that behavior happening over time, either in a polite way, or in a not so polite way. But that’s one of the benefits of teams working on a problem, not individuals. And again, what we’re seeing in the market at the moment is we’ve gotten to hyper specialization. We’ve gone to people really, really specializing on one small moving part, or we’ve gone to a single person end to end, where one person picks it up and gets all the work done without any colleagues. And both of those patterns for me are extremes. And they can work but you have more chance of success if you have a small team of people working together. And let’s be honest, it’s more fun. For me anyway, a group of people working on solving problems and leveraging each other’s skills and being there for the journey, that’s more fun than working on your own, but maybe that’s just me.

Tobias Macey: As a kind of higher level question. From your perspective of somebody who has been embracing these agile practices and working with teams and working in the data space, do you see the overall trend of the available tooling and systems and kind of infrastructure capabilities as bending towards kind of being conducive to agile practices? Or, do you see that they are in some cases actively harmful towards those approaches?

Shane Gibson: I wouldn’t say they’re harmful, because I think we can make any technology work, I think they’ve got better. If we look at the data space for transforming data via code, we’ve adopted good practices out of the software engineering space with things like version control, and CICD. If we look at some of the front end last mile tools, if we look at our visualization tools, very few of them allow us to check in our code and vision. We’re still back in the dark ages of going to that dashboard and change it and hope you didn’t break it, or copy of copy of copy to use that just to make sure I’ve got one I can regress to. I think we’re still at table stakes. As I don’t see our tooling enabling us to use the machine to do the work for us. Now, we’re still human centric, we’re still bashing the problem as individuals and we’re doing it relatively manually. So I don’t think we’re well served by technology in the data space to adopt agile techniques. I think we in the Data Domain, for some reason, we’d love complexity at the moment. We’d love solving that complexity problem, which off the technology stack, not suffer at solving the complexity problem of getting information to our customers as early as possible to add that value and get their feedback. So I think we’re focused on the wrong problem. As technologists, we love to solve those technology things. So I think it will change. I think we see waves, I think we will see a move back to less complexity, less involvement in engineering out the technology and more involvement around engineering out the data problems but time will tell.

Tobias Macey: And in your experience of working with clients, what are some of the, I guess perennial questions or points of confusion that you’ve had to work through to help people understand how to think about agile approaches, how to think about proper scoping, how to think about the kind of useful integration and flow and how to structure the work so that you can do these kinds of fully vertical end to end implementations?

Shane Gibson: So what we see is confusion of terminology. I mean, what we’re doing is we’re taking a whole lot of agile patterns, terms and practices and a whole lot of data terms and practices and patterns. And now we’re bringing in a whole lot of product thinking and whole lot of product terms and practices and patterns and that causes confusion. The difference between a product owner and a product manager. So what I see is I see some repeatable things that people struggle with. I see the data modeling problem, we can just go and create this information and work it out to a consumer or stakeholder without modeling the data. Because we think they’re being ad-hoc is the same as being agile and it’s not. We don’t want to do ad-hoc, we just want to chunk the work we down into smaller iterations that we can do faster, but still repeatedly, still safely. I see a whole problem around build versus buy. I see a problem around organizations that are focused around projects and programs. And that’s how they fund work to be done. And that’s how they structure everything. They want the teams to be agile. And so they’re putting these breaks on these constraints that aren’t making them safer. It’s just making them slower. One of the cooler ones I see is we have this concept of a heat shield which is a person in the organ Motivation that’s sponsoring the team to be able to adopt that agile way of working and fail as they do it. And so that heat shield frequently sits above them, something goes wrong happens, they take most of the heat for the team and keep the team safe. So if you don’t have that role, if you don’t have that person that sponsoring it in your organization, then the team are exposed. They bad behavior gets put on upon them. When things don’t go well. Now I see the teams tried to scale too fast rather than start off with one small part squad team, and prove your way of working, and then try and scale. They try and do it with 4 or 10, or whatever squads at once and that’s hard. So start off with something that adopting an agile way of working with data is hard anyway, give yourself some chances of success by removing that complexity, uncertainty. Same thing boiling the ocean. The idea that we’re going to go build out a platform for 12 months before we add any value to our customers is just madness. Another pattern that’s an anti-pattern is somebody other than the team makes the promises. So we have a project manager who goes and tells the stakeholders how long? It’s going to take the team, they’ve never done it before. The team are the only ones that can estimate or guesstimate with any sense of accuracy. And even then they’ll be wrong. But they should make the promises of what they’re going to deliver and when. My last one, which is my personal one, project managers who become scrum coaches, as a project manager, your skills and the things you’re taught to do a very different to being a scrum coach. So I find that things like business analysts make the best scrum coach. So those are some of the areas of confusion or risks that I see a lot of teams adopt as they start their journey.

Tobias Macey: In your own experience of working in this space and working with teams and building a product to help people adopt and adapt to these Agile principles. What are some of the most interesting or innovative or unexpected ways that you’ve seen people try to incorporate these ideas into the way that they work with data?

Shane Gibson: So every team that I work with is innovative. I learned early in my journey of coaching there, I’ve made a mistake. And the mistake I made was a week with a team, they got to a level of maturity. And when I went in to help the next team, I just for some reason started off at that level of maturity and the team went there, they needed to go back to the beginning, and kind of grow to that level of maturity. After that, I made another mistake, which was the patterns of the first team applied where applicable to the second team, we’re going to be successful. And that wasn’t true. It was different contexts, different organization, different data, and different platform. So the idea that each team can iterate and build their own way of working is amazing to watch that was unexpected when I first started. And the other thing is the team culture is really important. And we have to be really cognizant of that. An example is, if we talk about Scrum, and we do retros, the standard pattern is either virtually or physically, we’re putting stickies up on the board about what went well, what didn’t go well, what we should improve. And it’s a conversation. It’s very verbal, and it’s very visual and there’s a spasm in the room and that’s great. One of the teams I worked with, were really introverted. And so they didn’t enjoy that type of process, they saw the value of reviewing the work they were doing and the way they were doing and iterating it but they just didn’t enjoy that standard pattern that we would have for doing that work. So what they did was, they basically used as your DevOps, and they would have quiet time, they would sit in the room together, because this is Pre-COVID. And they would bring up the laptops, and they’ll bring up the board and they would type their notes, stickies into the board. And they would have absolutely no conversation. And then, we time-box them and then review, and they would go in and drag the notes around and make comments on the notes and dot score them, again, with absolutely no verbal conversation. And then we’ve scored the things we need to focus on next, what are we gonna do about it? And they’d go and create the next set of notes about the work to be done and that type over the top of each other. And it was like being in a library and it did my heat in. Because I was like, this is not what I’m used to, where’s the buzz. But for that team, it was the right fit to achieve the goal that we wanted them to achieve, which is look at the work you’re doing. Figure out what’s not going? Where you’re going to iterate and iterate it, that’s a core principle. That’s a core pattern. So for me, every time I work with a team, I see something that’s amazing and that’s cool. But we have to empower the teams to build their own way of working and encourage them to leverage patterns that have success, or may have success and experiment with them.

 

Tobias Macey: Absolutely. And in your work of working with teams and living in this space, what are some of the most interesting or unexpected or challenging lessons that you’ve learned in the process?

Shane Gibson: There is no methodology. There is no out of the box way of working. Every team is different. Teams are amazing. Just empower them and enable them to get the work done and typically they always will. So that’s the learning I constantly get.

Tobias Macey: For people who are kind of revisiting the ways that they work or looking at how they can incorporate new capabilities or new practices into their overall workflow for delivering analytics and data assets, what are the cases where an agile approach is the wrong choice for a data project?

Shane Gibson: So I don’t think there ever is. I’m a bit biased on that these days, I think all the alternatives are not as good as an agile way of working. However, there are some things that are big warning signs that you’re going to struggle. So one of the ones is if there’s no uncertainty, if what you’re doing is repeatable then an agile way of working may not be the best fit for you. But I’ve never seen it. I’ve never seen teams working with data where there’s not a high level of uncertainty. As I see there’s no heat shield, if you’re starting this journey, and you don’t have a senior person that can hold that heat when it hits from the rest of the organization, then you’re going to struggle, though, there’s going to be some real problems coming. And the last one is when the organization is command and control where it’s hierarchal, where there is a culture of blame. And people need to hide from that blame or serious consequences happen to them, like they get fired, those kinds of things, then your organization is not going to support an agile way of working, it’s not the culture of the organization and your team are going to be exposed. So have a really good think about whether you want to start down that journey, and whether you want to expose the team to that. Often I work with teams that are in a hierarchical organization, but we have that heat shield. So the team are empowered for this new way of working. And what do we see we see them be successful, and we see the rest of organization go, how are you doing that? And they start to watch and learn and ideally adopt themselves that’s what success looks like. But those are the property of three warning signs to me that you’re going into a high level of risk by adopting an agile way of working.

Tobias Macey: Are there any other useful references or resources or practice exercises that you recommend for teams who want to dig into and understand more about how to apply agile practices to their data work that you recommend?

Shane Gibson: I’m trying to spend as much time as I can publishing the ones that I’ve used with teams and a way that anybody else can pick it up for free. So if you go to agiledata.io, that’s a site we’ve created, where I kind of brain dump those patterns and templates as much as I can. There’s a bunch of books out there that I’ve read over the years and approaches that have value. So I’m a great fan of Lawrence, and his maybe the methodology, I think that’s a great way of gathering data requirements. I’m a great fan of Scrum, and Kanban, and Lean and some of those practices. And he’s lots of good books here about how you can pick those patterns up. Personally, I’m a great fan of Data Vaults and modeling technique, I find it’s the most flexible at the moment, there’s got some problems, but it’s certainly the one that can get to change the most of my experience. Those are probably the core ones. I unfixed at work is probably the best team topology explanation that I can find out there at the moment. So those are probably my go to when I point to patterns, and then the information product canvas, which I’ve published for me, it’s my go to whenever I work with a customer and try and understand what information they want delivered. So that’s probably my shortlist of go to stuff at the moment.

Tobias Macey: Are there any other aspects of your experience of working with data teams, or how to apply agile methodologies to data work or how to think about the kind of technical and team structures to support that that we didn’t discuss yet that you’d like to cover before we close out the show?

Shane Gibson: I think the main thing for me is a call out to everybody that when you find something that’s worked a patent this week for your team, just sharing is caring. Take the time to write it up in a simple way and publish it out. So the rest of the world can ideally see it and experiment with their pattern as well. Sometimes we hold those things internally. And if we look at agile, everything we do is iterating on other people’s work. That’s what it’s all about. We should try and pay it back and push some patterns that we’ve had successful wealth back into the community so we can help our fellow data practitioners where possible.

Tobias Macey: For anybody who wants to get in touch with you and follow along with the work that you’re doing. I’ll have you add your preferred contact information to the show notes. And as the final question, I’d like to get your perspective on what you see as being the biggest gap and the tooling or technology that’s available for data management today?

Shane Gibson: The obvious answer for me is reducing the complexity. And it’s what we’re focused on with agiledata.io. How do we remove a lot of the engineering process and practices and effort that is required and automate them? So I think that’s the problem today, where I think the opportunity in the market is over the next couple of years is  now everything we do is based on human effort. We don’t use the machines do the effort for us. And I think we’re going to see over the next 2, 5, maybe 10 years where we start using algorithms and machines to recommend and do the work for us. And I’ll give you an example. When we collect a piece of data and we look at it, if it’s coming from a relational database is foreign or primary key has been flagged, we know that’s a unique key for customer. But in this whole event streaming 80% of the poorly designed systems for capturing data that don’t have keys on it for us. We have to go and look at that data and figure out what the key is. And so using machine learning to actually identify a candidate looking at the data, and giving us a hint of what their key is, would save us so much time. And we’re talking about not a simple process, because it might be a concatenated key, it might be three or four columns that we need to say this is actually a unique record. But that’s what areas where we can actually use the machine to reduce the work we do. And at the moment, we’re not focused on that. At the moment, we’re focused on grabbing all the parts to actually build a machine that can run. So we got to get over it and move on to the next phase. So that’s where I think the major opportunity is using the machines to automate as much of our work as we can, that’s not easy.

Tobias Macey: Alright. Well, thank you very much for taking the time today to join me and share your experiences and thoughts on how we as data practitioners can adopt and embrace some of these agile ways of working. So I definitely appreciate all the time and energy that you’ve put into the work that you’ve been doing with data teams that you work with and encapsulating that into your product. So thank you again for all of that and I hope you enjoy the rest of your day.

Shane Gibson: Thank you for having me on the show. It’s been great.

 

PODCAST OUTRO: Thank you for listening. Don’t forget to check out our other shows podcast starts in it which covers the Python language, its community in the innovative ways that is being used. And the Machine Learning podcast, which helps you go from idea to production with Machine Learning. Visit the site at dataengineeringpodcast.com. Subscribe to the show. Sign up for the mailing list and read the show notes. And if you’ve learned something or tried out a product from the show, then tell us about it. Email hosts@dataengineeringpodcasts.com with your story. And to help other people find the show, please leave a review on Apple podcasts and just tell your friends and co-workers.

 

AgileData reduces the complexity of managing data in a simply magical way.

We do this by combining a SaaS platform and proven agile data ways of working.

We love to share both of these, the AgileData Product cost a little coin, but information on our AgileData WoW is free.  After all sharing is caring.

AgileData.io

Keep making data simply magical