TD:LR

Early in 2022 I was lucky enough to talk to the awesome Catalog and Cocktails podcast crew about agile in the data domain. Watch or listen to the episode.

Shane Gibson - AgileData.io

Listen

Watch

Read

Tim Gasper: Hello everyone Live from Austin, Texas. It's “Catalog & Cocktails”. It's an honest, No BS, non salesy conversation about enterprise data management, with tasty beverage in hand. I'm Tim Gasper, longtime data nerd and product guy at data.world joined by Juan.

Juan Sequeda: Hey, Tim, I am Juan Sequeda, a principal scientist here at data that world. And it's Wednesday, middle of the week, end of the day, towards the end of the day. And it's time to take that break and talk about data. Before I go into our topic. And our guest today, just want to remind people, next week is data counsel Austin, we are super excited because finally more conferences, and it's going to be here in Austin, we're actually going to be organizing an exclusive get together with the past and future guests of “Catalog & Cocktails”. And we want to reach out to all the listeners. So if you're listening right now, and you're coming to Austin for the data Council, we want you to join us for a little get together to say thank you. We're going to get together the day before the conference on Tuesday, March 22 at 7pm. Tim and I will be there. Patricia Thane will be there Sarah Canzaro. Our future guests, Chad Sanderson, and more folks, so please shoot me an email Juan@data.world or you can find us on Twitter and LinkedIn. And we're going to tell you where our special get together is, we want to kind of keep it private, first for all the guests and our listeners and then we'll open it up. So really excited to see you there.

Tim Gasper: Well, with that I want to talk about guests, the following guest is a truly special guest. Because this is somebody who has been a true follower of the “Catalog & Cocktails” from the beginning. And actually, I don't know when he started following us. And I would probably say it's almost from day one, and the best thing about Shane Gibson. He's the co-founder of Agiledata.io. And Shane is somebody who I've just enjoyed over and over the years to be interacting with him. Originally, people remembered when we were doing “Catalog & Cocktails”, it was a zoom call. And we would only do it for 30 minutes, we would stop the recording. And then whoever was on the call would have this discussions. And Shane was always there. And always just having phenomenal discussions. And we keep these conversations going on, on Twitter and LinkedIn. And Shane, it's such a pleasure and honor to finally having you here. So how are you doing, Shane?

Shane Gibson: I'm doing well. Thank you. Longtime listener, first time caller. So thanks for having me on the show.

Juan Sequeda: Shane you're the first guest who's joining us from the future. It's St. Patrick's Day already where you're from? How's everything going in New Zealand? And which leads us to what are we toasting? What are we drinking? Let me know, you can kick it off first.

Shane Gibson: Yes, so it's 10am on Thursday for me. So while being a good Kiwi, having a beer at this time of the morning is not unusual. Got a bit of work to do this afternoon. So for me, I've got a little drink that's made up with a short expresso, a bit of ice and tonic, which gives you a non alcoholic sweet coffee, flavored refreshments. So get it hot, give it a go. And what am I toating? I think you know, one of the upsides of the chaos that's happened over the last couple of years for people that are at the bottom of the world, like me is the ability to have remote sessions like this. Three or four years ago, we had to travel for 24 hours to go to a conference. Now we're lucky enough to be able to hop on a zoom call or attend something like this and actually connect with people, the same as us around the world and the dataspace so for me, that's been one of the few upsides of what's happened. So toasts that now.

Juan Sequeda: Cheers to that thing that has truly changed how we perceive this. Tim, how about you? What are you drinking? What are you toasting?

Tim Gasper: I am drinking a Paloma today, just keeping things simple, grapefruit with tequila. And I'll toast to that as well Shane, it's really awesome that you joined our community of “Catalog & Cocktails” so early with such great ideas and such high and it's so great that despite all the crazy things that are going on in the world and everything that we've had to deal with for the last couple of years here with the pandemic and so on, at the same time we've also found a way to stay connected and to deepen our connections with each other regardless of where we are in the globe and so really excited to have you as part of this community Shane and excited to have you on the show today.

Juan Sequeda: And well I'm having some caple carb and now I put some agave syrup and some lime sparkling water. So that's my drink today. And I want to cheers on the community that we've been created on the podcast and have to say, this week is South by Southwest. And I've been attending several sessions, and actually been focusing on podcast sessions. I want to go learn what's going out there. And I have to say, like everything that we've been doing here, people have been really impressed. I've been telling, we're doing this live recordings, and people are like, that's ninja stuff. I would not want to go do that. And like you don't edit and stuff as i, and as we were telling Shane, like, we do this all on the fly. We come up with the lightning round questions. They're not scripted. Like we come up with this on the fly. So cheers to be able to get connected here through the podcast. So Shane, really appreciate that. So cheers. So our quick warm up. Funny question today is Shane as a longtime fan of the podcast? Who's your favorite host? And why isn't it Tim?

Shane Gibson: So it's an interesting question. So we have to look at it from a data point of view. We're all data geeks. And I didn't have enough time to go and actually get the data points. But what I would do is I'd think about who makes the best one liners, don't boil the ocean brakes on the car to make you go faster, not slow you down. And so I don't actually have the factual data. But I have a feeling that one you tend to use them more often now. I don't know whether Tim actually invented them. And you're just using them thought? That's how I would actually judge from levels. Love those one liners. Because they are actually real. Don't boil the ocean. Those been three years to learn something, do it quick. Do it fast, learn from it. So until I've got the real data. It's got to be you.

Juan Sequeda: I can't wait for my shirt that says don't boil the ocean. And they talking about that we will get shirts out and a lot of merchandise and have a bunch of this stuff. So I've already seen some sneak peeks about it. So I'm really excited about that. So you come up with more one liners.

Tim Gasper: I got to come up with more one liners and we were having a funny conversation internally a data world. Shane, let's do it. What does Agile mean in the context of data?

Shane Gibson: So when you ask people what Agile means, you'll get 101 answers. You'll get Scrum, you'll get lean, you'll get flow, you get Kanban, you'll get the speed and the depths, you'll get iterates, you'll get time box. For me with the teams I've worked with, I come back to it's a mindset. It's a mindset of looking at the way you're working, figuring out what's not working for you, trying to identify some patterns and practices that may make that way of working better, and then experimenting with them. And seeing if they do fix the problem you have and adopt them. And if they don't, find something else, if we look at the Agile Manifesto, if we look at individuals and interactions over processes and tools, we know that's important. So most teams that are successful, they talk to each other, they work together, they have processes and tools in the background, but it's not there focus, following the sprint is not what they do. We look at the one around working software over comprehensive documentation. So from a data world, we don't talk about software, we talk about valuable data. And we all should agree that documentation is important. We have to do some documentation. It's not that we don't. But it's not our focus. Writing things down is something we do when it has value, not the thing we do at the beginning of any process. And customer collaboration over contract negotiation. We should be talking to our end users, to our customers, the people, we're going to get value from the data about what they want, not what we think they want. And if they change their mind, that's okay. There's a cost consequence. But that is okay. So that's, the last one responding to change over following your plan. So for me, it is a mindset. It's a field, when I go in and work with a team, with a new team, I'll tend to observe them for a while, because I want to see how they are currently working. And then, help them change the things that aren't working for them. And so in the Agile world, there's lots of practices and patterns that we can adopt that help us do that quicker and faster and better. So that's my view at the moment. Ask me in a year, I would have inspected and adapt to that a little bit, hopefully.

Juan Sequeda: So let me repeat this because this is really nice how you how you're presenting it. So it's more of a mindset. And you want to identify the patterns that work for you. And these things evolved. So we want to go experiment. You, may have a sprint, but following that sprint may not be the crucial thing to go do. Yes, documentation is important. But it's not. But it provides value, it shouldn't be the focus of things. But you may be doing something you realize it's not working, because we lacked documentation. So you need to go iterate on that. And in the next iteration, you will add more documentation. And at the end of the day, you really need to be providing value to the end users. And it's fine if they actually change their mind about things. But okay, let's go change it. But acknowledge that is going to have some consequences and go deal with it and go iterate and keep continue. So that's my summary, in my own words, what you said. Anything to add there. Are we are on the same page, I liked this definition.

Shane Gibson: For the challenges always find the end user or customer, that doesn't change the mind. And half the time, it's not your fault. Things have changed. The organizations changed, the markets change, COVID had changes on them. So how do we pretend that by locking something in and not leaving them change, when everything else changes, that's valuable that we're saving them well, by locking them into that box and not letting them adapt, when they need to adapt? Because of us. So for me, it's that idea of adaption, change is constant.

Juan Sequeda: So, let's get more concrete around this, because this is just talking about Agile in general. But what does it still mean to be Agile in the context of data?

Shane Gibson: So Agile came out of the software industry, if I think about it. That's where the genre is, where the XP, we have a whole lot of practices around software engineering. And a lot of those practices and patterns are applicable in data. But data is kind of weird. And it's taken me a while to figure out some things that make data different when we're adopting Agile ways of working. And I still don't have the answer, I can't give you the one line of data is different, because, but here's some examples. People tend to find it easier to decompose features in an application than decomposed data. Let's think about shopping cart. We have the ability to add something to a shopping cart, we have the ability to view the shopping cart, we have the ability to checkout, we can think of those as unique features. So when we're iterating our work, we can break those things down, we can describe them, the team know what they're working on. When we think of data, we tend to think of a big amorphous blob, we've got all our sales data. And so teams struggle to go well, how do I decompose it down to a smaller piece of work that still has value, that I can deliver early and the work of an expert. So I think that's one of the challenges in data. I think the other one is we don't control the data. If we're building software, we control how things are entered, we control the user experience, in the data world, typically, we are given a pile of poo. We get given stuff that doesn't fit the core business processes, it's dirty, it's messy, it's not structured the way we want. It has things in it that are wrong. So we're not in control into when process because we collect the data from somebody else. And that's a real challenge. We have to do extra work that our software engineering Britons don't.

Tim Gasper: That makes sense. And so there's obviously some differences between the challenges that data teams have to face when they're trying to take an Agile approach. But it sounds like you're saying that doesn't change the fact that Agile can be really valuable for these data teams and should be really a central part of how they operate. Do you feel like a lot of data teams are Agile? Or do you feel like there's quite a big gap?

Shane Gibson: I'm seeing more and more of it. So I think for the data teams, we are years behind the software engineering teams, but we're trying to catch up. I see most teams now, that I work with and most people I talk to, adopting some Agile practices and patterns. So the old days of tools. Let's look at waterfall, we would do a requirements document for six months. I don't see a lot of that anymore. I see a lot of ad hoc. So, that's a different pattern. But I definitely see Agile, and different words for it. So we'll talk about later but data ops, data image, there's a lot of Agile mindset, and those things that have been described patterns and I can look at and say, well, that's, teams that I've described as being incredibly Agile. They behave those ways. So we see different names for it, but lots of the patterns and practices of Agile.

Tim Gasper: Definitely phrases like data Ops seem to becoming much more popular recently to capture some of this pattern as you call it. Maybe we can go into these patterns a little bit more deeply. So you mentioned waterfall, you mentioned ad hoc. Obviously things like Scrum and Kanban are mentioned a lot in the context of Agile. Could you walk through these things a little bit and explain really quickly what they are? And which of these are best for data? Is there a particular pattern that you see being more effective done a certain way?

Shane Gibson: If you remember the days and data where we used to argue, come over. And as technologists, we'd love to have religious arguments over the thing we think works the best. And we do a lot. I do a lot on Twitter and LinkedIn, and enjoy it. So if we think about the Agile world, there's lots of frameworks and methodologies right there. And I think of them as patterns. So Scrum is probably the one that's everyone knows. And Scrum, if you look at it from a patent point of view, what's it about, it's about batching up your work, it's about taking a bunch of work, moving it down to the smaller batches, having a itteration period of time, that you're going to focus on, taking something from the beginning to the end, until deliver that value to your customer.

Tim Gasper: And we're effectively putting in artificial constraints, we say we're going to time box at two to three weeks. And you need to have pushed that value out to your end customer. And by doing that, that forces us to change our behavior, because now we constrained right, we have to change the way we work because of that. So I tend to see, most teams, when they start off, we'll start off with Scrum, because it seems to be more well known. It is well described, there are lots of courses on it. So education is accessible. If we actually look the way a data team works, they're more what I call flow base, they are more like a factory, there are a bunch of stages or stations, we collect the data, we combine and clean the data, we go and present the data and then give it to them in a way to be consumed. We look at reference data, we look at master data. There's a whole lot of things we do.

Shane Gibson: So if you think of as factory, there's a bunch of stations, and we pass it over to the next station, they do the little bit of work and we pass it on. But what I find is if a team starts off by trying to implement a true flow based model, so if we use some of the Kanban, lean processes, the way working doesn't change a lot. And therefore their adaption to that change doesn't seem to be as great then if they start off with Scrum. So when we work as a team, I tend to encourage them to start with Scrum. And then as they work and optimize the way they work, move back to more of a flow. Now, that is quite a permissive approach as an Agile coach, and a lot of the people in the coaching world disagree strongly with me in doing that, that we talk about, should we provide patterns and encourage people in a way to work or should we just let them find their own way? Should a coach have been on the field, to be an Agile coach and data? Should you have actually done data work, or can you just be a good coach? So it's a whole lot of depends.

Tim Gasper: That's interesting. And I totally get your point about like, some folks in the Agile coaching space especially may really want to take a more dogmatic approach to how you go at some of these processes, because to really encourage good patterns and sticking to it, versus allowing folks to find their own pattern that works for them. Because, obviously, companies do that over time, but especially when you're teaching them the muscles and getting them through the pattern and learning the pattern, it can be a challenge. The few times that I've been in companies where we've had an Agile coach, they've definitely come in very strong with like, you're going to do Scrum. This is exactly what you're going to do. You're going to go in, on Tuesday mornings, we're going to do this, Thursday mornings, we're going to do this. Obviously, the world of data is a little bit different. But a lot of these practices apply, as you're mentioning, do you see that companies get a lot of benefit, if they start with Scrum, get used to that. And, how long do they need to really get used to that before they can start finding their own way. Is there a method to, finding your own unique approach to the patterns?

Shane Gibson: So two things on it. So for me, Agile is about the team everbearing way of working, they should be working in a way that that's more fun. They feeling more sense of achievement. They weren’t in control of their destiny. They're self organizing, they're controlling the work that they do. And as a result of that the organization gets benefit. I'm not a great fan of the current McDonald's behavior. We're certainly large consultancies are rolling out Spotify as a model, and saying that will reduce your staff and the organization by 25%.That for me is not what it's about. It's about changing the way you work because if you don't, your organization won't survive and we enable our teams to change the way they work and we with the benefit of that, if we take a team that is starting the journey, and we're applying an iteration process on it, let's call it Scrum, we will tend to see somewhere around three to six iterations before they start rocking it. Because effectively, we're disrupting the way they work. And that's really key for the stakeholders to understand. I talk about one of the important things we need when the team starting the journey as an umbrella, a senior person who's going to hold the umbrella or above the team and stop all the brown stuff, hitting them for a while, because what we do is we break the cycle, we break the way of work, and we told them to completely change what they do. And therefore, things don't go so well at the beginning. And then after three to six iterations, they tend to have gone back to forming their own new way of working, and it seems to gel, right now, after three to six iterations they're not, then we've got a fundamental problem to work within either the team or the organization. Because we haven't gone back to that stage. And we typically, in my experience, we would have.

Juan Sequeda: So I really liked these two panels, you're presenting the scrum, you break the work into smaller pieces, you can add these constraints. But that's something you would start with, and then you have the flow base. Like, the factory does have stages. What's going through my head is like these different stages as human or different stations, wouldn't they be doing also a scrum within their own stage. So it's like a team within a team that you start seeing? Like, isn't there some hybrid that can be done, as I can imagine, if there's a team who is in charge of cleaning data, or, doing data quality or different stations, they're going to get a bunch of work that needs to get done. And they also need to organize that and they're like, I'm going to deliver this to you this week. And then the following week, wouldn't that be a mix about this?

Shane Gibson: So I'm not a great fan of hybrids on day one, because Hybrids are complex, I'm not a great fan of matrix liberals on day one.

Juan Sequeda: Fair point. Just to go back to your original point is, you want to start with something and then see if it works, and you iterate and change. So I would agree with you that you wouldn't start with Hybrid from day one, you would choose one and then something would eventually be morphing into that.

Shane Gibson: We want to help the team be successful. If we think about we're disrupting everything they do, we're removing the rug from under them, we want to give them as much safety as we can. So let's give them things that are well described, that are known patterns that work and getting successful that and then let them generate their own ways of working. So one of the major problems we have is scale. It's not a problem just for Agile. It's a scale problem globally. So, our team that are two pizza boxes, we talk about between three and nine people, we know that they can collaborate well together. And we talk about the lines of communication, we know that works. So let's start off with that, let's get that way of working, going. And then it's scale. And then we may decide to scale using a flow based model. And if we do that, then we focus on different things. Now we focus on how we handwork off to another team, another squad, another pod, how we articulate what we've done and what we'd like them to do, how they accept that work, how they know that the work they're accepting is fit for purpose, it's done or done, done. Or so we have a thing called definition of ready, where we will write down the things that we want to take off before we believe that works really to be done. And we do that within Scrum. And we should do that with a flow. Within flow, we focus about cycle times, we start using data. How fast do things move through the system? Where are the blockers? How do we unblock them? When there's a stoppage or an outage from combat? How do we actually all swarm and stop that happening, and just fix that problem so that the system can start leading. So each of the patents come with different focuses, different things we should look at. And so again, you just got to be really clear which one you're doing. So if you have adopted a scrum pattern, then be very clear. That's a scrum pattern. So one of the challenges is how do you take a piece of data work and go from an idea, to a user that's consuming it for value, and a time box, that's two to three weeks. And that's really hard. So you'll see teams, use a technique around pipelining, where they will do one iteration, which is more discovery prototyping, in a second two to three week iteration, which is more building the core code and then a third iteration around visualization. Now I'm not a great fan of that I tend to want the teams to work into in cycle and decompose the work to be done down in a different way. But of pipelines, we can feel them great. The last thing the difference for me between iteration based and flow based is flow based, we start to think about hybrid specialization. With Scrum or iteration, we talk about cross skilled teams and team skills and that everybody in the team can work together to get the work done. And I think what we're seeing in the data world right now, is we're with the whole idea of analytics engineers, we're moving back to hybrid specialization. And personally, I tend to prefer crosshill teams, it's more fun, and I find it more successful.

Tim Gasper: Interesting, this is some good advice here around how to really get more iterative and more value out of the data work that you're doing, and some different approaches that you can take. And I think a lot of our listeners are probably really appreciating the specificity here and some different approaches. One thing that was interesting, just before we move on to another topic, that came up in some of our conversations leading up to our podcast today with you, was that you had mentioned that sprints, if you're doing a sprint based approach, it should be three weeks instead of two. Can you talk a little bit about why you said that?

Shane Gibson: Well, Scrum, and then the Agile world, we talked about two week iterations. And after we had a chat, I went to Google it, frontal research, where did two weeks come from, and I can't find the job where it started the genesis of that. It's 101 articles of why two weeks is great. Four weeks is too long. One week is too short, one day is incredibly hilarious. Try get your teams to do a one day sprint. It's so funny. We've done it. That's hilarious. Like, if you think about it, there's no reason why you can't decompose the work down into a batch of one day.

Tim Gasper: You try and do that with six people. It's great fun. Never had it been successful. Yes, so sprint planning from a scrum point of view, we tend to timebox it, based on the length of the sprint. So if you've got a two week sprint, your sprint planning is typically a certain size, that didn't matter tight. If you do it for four weeks, then it's longer, if you do it for one week, it's shorter. Because we're trying to break the work down into smaller batches. What I find is, for some reason, when I start with a new data and analytics team, getting them to start off with three weeks becomes more natural for them. The only way to look at it, I think about it. And I can see a pattern where they spend the first week exploring and prototyping the data. And they spend the second week dealing with the model and the code.

Shane Gibson: And then the third week is more the verse or the last mile, way they make it a consumable. That becomes the natural flow of the team, and they get successful. And don't get me wrong. Getting a team to go from an idea to a consumable information product in three weeks is incredibly difficult. But I have worked with some teams that have been gifted enough and brilliant enough to do it. So for me, I say start off with three. Now what would typically happen as a non-data Agile coach, one who hasn't been on the field before, I had it the other day, working with an organization, Agile Coach turns up, they're all working on three weeks. That's ridiculous. We got to move into two. And I'm like, why? let's observe them first, and see if three is working for them. It's their way of working. And again, I cannot reiterate, it's their way of working, we are there to help them, based on our experience and the practices and patterns we've seen been successful before.

Juan Sequeda: I love that you're being very pragmatic about this. And like me, you don't have to go follow the Bible exactly how it is, is whatever is working for your team and keep you observing and then go improve around that stuff. And then as you say, they eventually got down to two weeks. Great. All right. And that really depends on the team, and how they work together. Which leads me to think about more in what are the roles? So what are the roles that you're seeing within data teams or being Agile? What are the different patterns that you're seeing within people in the roles?

Shane Gibson: So if you think about data as a supply chain, it is a factory. We have data that comes in and we do some stuff to it, data gets consumed. And for people out there, there's a really great TED talk called, How to make toast. It's one that I take all the teams, I help through on day, right at the beginning, and it talks about nodes and links. In fact, you'll love it because it's based on graph theory. And so it talks about these things to be done and then there's a link to another thing to be done. And so I actually get teams on a big wall or on a neuro board for remote to document the way that we can out now. So a node is the thing that you do a task. And then what's the next task? And how do we want to work? So if we think about that, let's get the team starting off with that. And then figuring out where the nodes aren't working for them. And then they focus on how do they fix that problem? And if we think about that, then what we should look as all the roles. So when we talk about T skills. It goes like this, we have a bunch of skills that we know, needed from the data world. So we have facilitation skills, how do we gather these requirements out of our customers heads on what they want, we have a way of modeling the data, we have a way of writing code to change the data, to do bad things to it. So it's easier to use. We have people that are really good at visualizations, we have people that are good at machine learning models, or writing statistical code. So we have people that go to documentation, we have people that go to QA and testing. So we need all those skills in the team, if we're using a batch. If we're using an iteration. And so what we do is, typically I'd go and identify all those skills, I get the team to talk about the strong T’s. So where they're really strong, and the secondary T's were there are quite good at it. And then things that they hate. Because it's the things they never want to do. So they're bad at it. And then we overlay that and we say, where's the gaps in the T? You can look at it, you go, we've got very little testing skills. So why do you want to do? Do you want to upskill the team, would you want to bring in another team member that has those skills to cross pollinate the team. And so what we're looking for is self organizing, interviewing skills. And the reason for that, and this is the key, is that team is no longer dependent on anybody else, they can now be in control of the work to get done. And amongst themselves, they can decide how to remove a blocker. As soon as the dependent on somebodyelse, we have a natural blocker in the flow. They have to stop and wait for somebody else. And that person will that team, the work been asked to be done may not be that team's priority. And so now that whole batch, that whole window, it's gone. Because it might take three days, but the team doesn't have three days to wait anymore. So that's why we talk about cross skills. And I'm a great fan of cross skills. The other thing I like about it, it's fun, the team learn new stuff. They're not handing it over to the data model that sits in the cupboard for six months, building out this beautiful canonical into the model that nobody will ever use, there in the modeling the data going, how's this working? And they learn more, It's just more enjoyable.

Tim Gasper: I love that you're bringing up cross skilling, do you feel like today or the current state of things that data teams are actually a little over specialized and there needs to be a little bit more meshing of skills.

Shane Gibson: Yes, we are hybrid specializing roles. We see this wave every six to seven years, we go to hybrid specialization, we see vendor washing of the market to bring out tools that are very specific to their specialization. And then we watch it all collapse and go back to interweaving tools and cross skill teams. So I'm a great fan of cross skilled teams. So I'm just waiting for the wave to hit again. I'm going to rant on hybrid specialization.

Tim Gasper: I feel you on that one as a one of the reasons, why I'm excited about analytics engineers as it is an emerging one, not because it's yet another skill set, because that can also be an interpretation of it is, like DBT, it's yet another thing. But I'm, hoping that actually analysts and engineers start to come together a little bit more. And we see more of these, whether you call it a full stack, or whatever you want to call it, type of data person who can operate in a more broad way. I think this is going to help sprint teams accomplish more and, be able to more dynamic.

Juan Sequeda: So one of the things we've been talking here a lot about the data work, but I want to go talk about what is the actual deliverable, which brings up the topic or out of data products and stuff like that. And we've had this conversation before and I've seen you commented on LinkedIn and stuff, you don't call it a one thing is a data product. But another thing is an information product. And are we going to have a wisdom product, or knowledge products, but I saw you posted that one the other day, but I'd love to hear your perspective. Because we talked about data products, and everybody's talking about data products, but you had a different take on it, with this whole notion of an information product. So I'd love to if you can provide some insights on that.

Shane Gibson: So an organization I was working with a while ago, probably eight, nine years ago, we tried to figure out how do we decompose the work to understand the data requirements in a way that that's quick, where we're not boiling the ocean? We're saying okay, what are a couple of things we need to produce first, and how do we put some boundaries around them? And that team came up with a term information product. And so we worked on well, if we're going to work with some customers and want to understand what they want to do? What questions could we ask them? How can we box it? So we talked about a bunch of patents that are already out there. So there's a concept of a vision statement that came from crossing the chasm from Geoffrey Moore. And that's a way of having natural language sentence, as I want to do this, it's a user story, but a little bit more than it's in there. And then we looked at what business questions do we want to answer, how many, how much how long? How many customers we got? How long does it take to acquire them? How much money we making out of them. And we looked at core business processes using a pattern called beam from Lawrence Corp. So who does what customer orders product? And we found that the things that we could actually have, a natural language business conversation with somebody and then write it down on a short form. And that time, we had a document that we used a really short template, since then a couple of the other teams I've worked with, we've iterated on that to make it a canvas, based on the business canvas. And so for me, the focus was always around inflammation. Because what does the person want to consume? And typically, what they want to consume is an answer to a question. So they can make a decision, and we call that information. Now, (inaudible 00:36:15) because everybody's calling it data products. And so I've struggled with that a little bit. And then we came out with data as a product. And so I'm starting to try and get clarity around the patterns. And the way I describe it as data as a product means our customers, another data person, or a system that needs data, we're delivering data as a product that has a whole lot of patterns, that should be discoverable, self describing, whichever contract, whiach going to delivered, we should know the quality of it, because that's the data that we're buying. But if you go and ask a consumer and user stakeholder, nine times out of the 10, they don't want a piece of data, they want some information that helps you make a decision, that may be a dashboard or report, it may be using a reverse ETL tool to pump some data into Salesforce. So when somebody's calling in, I can see the last problem they logged, that's information that we're going to use. Now, I'm not going to go into wisdom products and knowledge products. The only way I can ever describe those things is using the example of a tomato. And that works. But for me, I want to focus on the information that our customer wants, or the data, that the data user or the system requires. So for me, that's the difference.

Juan Sequeda: So this is a very important distinction, because we were always talking about data. Now that with the whole data mesh conversations, data product is something that we were always talking about, and I think people still struggle on what is this data product. Tim and I have been working on, we're going to share a document next week with everybody. We've got coming up with this ABCDE framework of what is a data product, data product needs to have accountability, boundaries, contracts and expectations, downstream consumers and explicit knowledge. But this is aligned to somebody's consuming actually the bits and those bits can be manifested in, depending on the consumer of those bits wanted, I wanted as an API, probably I want it as a tabular form. I want it as a sequel interface. I want it as a graph interface. But it is a technical more consumption layer. But I like to use, there's a distinction is that the information product is something that is going to go even further off. I would actually say that the type of a dashboard or Tablular dashboard that actually provides the answer to it. That is a type of information product. And I think I'm seeing some of the conversations with fallen on LinkedIn with Joseph Hillary. He's an analyst at Eric's Eckerson. And he's been talking about this too, like, dashboards are data products. And I've seen him writing about this. So this is a call up for Joseph Hillary, is doing some great writing. I've really liked the stuff he's doing. But I don't agree with him as I would consider a dashboard, a type of a data product. But the way you're framing it right now, Shane, I do think we change the name, words matter here. And information product is it's that dashboard that's being consumed. It's that connection that happens to go through some reverse ETL whatever that shows up in the Salesforce thing. The end consumer goes to the Salesforce, they're actually getting the product that they wanted. A data product was probably involved in that process. But that's not what they're consuming. What do you think? I'm seeing you looking in different directions.

Shane Gibson: So I'm just seeing it through so we think about data and information product. What are the key things is, its not just the dashboard, it is the code and the data that gives us the right, So there's a boundary from the beginning to the end. The way I tend to describe it is, if you think about an app on your phone, you go in there, and you've got Twitter, It's an app, it has a boundary. There's a bunch of data for an audience. That was a treat as an outcome. And then when I go in, and I play Haley, that's an app on my phone, and that has some different data. And I'm still the audience, but I'm using it for different reasons, I'm taking a different outcome or action from it. One is rant about vendor washing of technology, and one is some downtime to feed my animals. And that's how I think about an information product. It may be a dashboard, or maybe a report. But the key thing is we ask our customers what they want. If they want a piece of data, we give them a piece of data. If they want a dashboard, that's pretty so they can make a decision. That's what we give them. So the mechanism we deliver it to should be aligned with the way our customer wants to consume it, we should make it easy for them. But that product has to contain that into stuff and then we get into scaling problems. How do we make sure the data is reusable? that we will define customer, we define and share it in the same way, we don't have 16 definitions of active customer and each data product. And so we get into scaling problems. But as technologists, it's our problem to solve. That's what we should be good at. We should just give the customer what they want.

Juan Sequeda: So now, I'm thinking is that what you're calling the information product? I'm seeing other people starting to go talk about data apps. And you can imagine then that the connection I'm making here is that what you're calling here information product, is really what others seem to be calling a data app that will talk to a data product, the data app is the actual application. That is answering, that is providing the answer to the question. And therefore you can then argue well, then is Tableau a data happens same? Well.

Tim Gasper: Sometimes the data is the product, but sometimes other things are, that will be getting all the semantics around that?

Juan Sequeda: How much do we need to get hooked up on this?

Shane Gibson: So again, this is really important when we help teams work in an Agile way, language is incredibly important. When we're talking about practices and patterns, when we call it tomato, we call it a tomato. We don't call it an orange. Because you put tomatoes in your fruit salad, bad things will happen. You out oranges, all things good. And this, of course, you like tomatoes into fruit salad. So we got to be really clear on language. And that's why the whole data product, information product, confuses me. I'm happy with a pattern and language that goes data as a product and information products because I can differentiate them, I can tell you how data as a product, is different to an information product. And I can also describe how an information product may consume data as a product. Because if I could go and buy the data from a third party without having to write all the AOT, I would, you're reducing the friction for me, you're taking away some of the work that I no longer need to do. Why would not do that? If that doesn't happen, then I've got to build my own piece of data to serve the customer.

Juan Sequeda: This is a really good discussion. This is what people are thinking about right now. And we're hungry to be able to have this really concrete discussion. One thing I wanted to touch on another aspect is we've been also having these conversations on Twitter is about knowledge, the other day, you were talking about you have customer Ids. And without them, how we're going to start integrating things. And Kent Graziano, former from snowflake chimed into that which by the way, Kent, if you're listening to us, we really want you to be on the show, I've already reached out to you, you've been called out as a guest. And what we really want is to have some agreement on the semantic meaning of what is a customer, all these keys? We've been talking about data around this? Where is knowledge fitting in all this?

Shane Gibson: So for me, it comes back to sheet language, so give you an example. Working with a large financial company many years ago, we wanted to describe customer. And we want to describe what an active customer was. So we ran a workshop. And we had the risk team. We had the financing, we had the marketing team. It was that cluster, they were just arguing about the definition of active customer and the problem was they were all correct. So the way risk defined an x of customer from a regulatory reporting was different than the way finance recognized it, from a revenue recognition point of view, which was different to the way that marketing too, marketing pretty much said, if you weren't deed and you contacted us once, you're an active marketing customer. Finance said it will be a dead where your accounts are closed, and actually you're not active and then risk, we had a really constrained version of active customer from a regulatory point of view. So what we ended up doing was we ended up saying, Okay, we're actually going to produce three different numbers. And they're going to hit those words, we're going to talk about finance act customer and marketing act customer, and work acts customer. And we will never use the word active customer until the organization agrees the definition of that. And then that's mandated. And so federated governance model, we would call it now. So for me, that works as a practice, as a process, as a pattern. But how then do we implement that? If the data's in five different systems that holds a customer ID, again, it's taking a technical problem. We need ways of working that solve that problem for us, both technology and practice.

Juan Sequeda: This is where we say words matter, it is important to start thinking about the language and what you said like, if we're calling a customer, we can agree, at least understand it's a customer for the domain of marketing, and so forth. That's the first step we should go do. And then let's figure out where that friction is. So Shane, until we can keep talking, we got to jump now into our lightning round section here. So all right, we're going to move to the lightning round, which is presented by data dot world, the enterprise data catalog for the modern data stack. Well, again, very lucky, we get to do those things that data out world. So I'll kick it off. Is the bigger issue in AgileData going from ad hoc process to an Agile process or is it to go from like scrum to flow process?

Shane Gibson: It's, the team are changing the way they work, that change is hard. So ad hoc, to Agile, Scrum to flow, flow to scrum XP to safe, you're changing what you're doing here, change is hard.

Tim Gasper: So next question. If teams are very specialized, data teams. For example, let's say it's a data engineering team that's focused on a very specific part of the stack or a specific area of the data. Can Agile still work? Or do you need to refactor your teams?

Shane Gibson: Yes, Agile can work. The team can organize the way they work to make themselves more efficient, have more fun and deliver more value than they control. Regardless of how the organizational structure happens. Some organizational structures make it easier.

Juan Sequeda: Is the data product and data as a product paradigm going to make data team’s life easier or harder?

Shane Gibson: No. Harder. Using different words for the same thing and not describing it well, causes chaos and arguments.

Juan Sequeda: So I guess the lack of knowledge and semantics here is what's going to make it hard. So we need to move to the knowledge first world to make sure that we get this as easier.

Tim Gasper: Our AgileData teams and Agile software teams going to start merging together?

Shane Gibson: If data mission is successful, yes. I don't think it will happen.

Juan Sequeda: We're going to do a brand new section to that right now. This came out from discussion with Shane before. It's called the mesh minute. So Shane, I got my clock here. One minute, you can rant, whatever you want about data mesh. I'm going to stop you at the minute. Because we can go off, ready, set, go.

Shane Gibson: So I've been quite vocal in database. And what I want to call out is I want to call out the vendors that have been to data mesh. So Mac has just finished the book with her thoughts of what data mesh is, how can we go and pretend that our legacy technologies that we wrote 10 years ago, mesh enabled? Seriously, it's bollocks. So please stop doing it. If I look at the principles of mesh, domain orientated, decentralized data, we talked about subject areas years ago, and it's a good pattern. We should try and retrieve that data as a product. We should try and achieve it, self service data platforms, we should try and retrieve it federated governance. Take it out of the teams and push it back into the vendors to help them do, we should try to do it. But we're at the beginning of that journey, it's a new way of working. Don't pretend you've done it. And when you actually find something that's useful, share the pattern and it's what I asked. That's what I hope for.

Juan Sequeda: Awesome, perfect timing. And I have to say, I fully agree with you, I thought you were going to be a bit more controversial. At the end of the day, there's no data mesh vendor, you can't go by data mesh, and anybody's telling you a data mesh, please run away as fast as you can. And then all these pillars, everything individually, they just didn't come up. They were invented recently. No, they've always been existing, I think it's identifying which of these four wants to co put together and how to go put it together. I think that's the interesting part. So I think we're super aligned on that. So after the MM, we now go to TT Tim, take us away with your takeaways.

Tim Gasper: Let's do takeaways. Well, my first takeaway is that I love the mesh minute and I can't wait to keep on doing that. I think that's going to be a great little addition to the segment here. But onto the actual topic at hand. I loved that. Shane, you talked about how Agile means a lot of things. And what's right for the company is going to be different, depending on what their goals are, how they operate, the work that they're producing. You said at its core, Agile for data is an Agile in general really is a mindset. And that you should identify the patterns that work for you. And you should experiment with them and figure out the right pattern that makes sense for you. Following the script is not crucial. But obviously, you can get a lot of benefit from really kind of getting experience and training in certain patterns. Understand the value of the end users, if they change their mind, and they find that there's something else that they need, then that's great. Acknowledge the consequences, and changes happen. So embrace change, be dynamic. Documentation is important. And you talked about some of these different patterns. You talked about, first of all, trying to avoid waterfall in this particular case. There's ad hoc, which is a pattern. A lot of companies, I would say probably still have it. So you got a few different data approaches, you've got waterfall, which we're trying to move away from, ad hoc which a lot of teams are doing, Scrum, which is a good way to really get in the habit of doing AgileData, and whether you're doing two or three week Sprint's or something different than that. And then ultimately, data work is very flow based. And so there's a flow based approach that can work here, whether Kanban, or some other kind of derivative of that. So I think this is really great in terms of thinking of these different frameworks. And Juan, what about you? What were your takeaways?

Juan Sequeda: Well, I got a couple here. So I really like thinking about this data as a supply chain. So when you're starting, its actually document the work that we know that's going on, and I love how you're saying, let's go, the tasks that occur, their nodes and how they're related. Those are the edges. And let's start with that figure out what's not working, that's really going to help us to understand the whole process, where to start. And when it comes to the teams, is really we want to have cross skills to understand what are the primary skill, secondaries and the skills that you don't want to go, do find those overlaps, and then amongst themselves, we can self organize. And then we had that discussion that there's a lot of specialization that we need to be careful about that, cross skill is important. And then this final discussion we had about data product. Really, what does the end user want to consume? They really want to consume as an answer to a question, this is information that is going to help them make a decision. And when we think about data, it's a product is where the customer wants the data.

Therefore, the data needs to be discoverable, self describing, has have quality, contracts. But this notion of an information product is the consumer just wants the answer. So really understand exactly what the customer wants. And if they want a data that they want private data is a product. And information product may be a consumer of ideas of product or data product here. And I'm starting to make these connections with the data app. So I'm finding this or something I really want to go dig into. And we had a quick discussion there on semantics and knowledge and words matter and this is what really the piece that's lacking. Let's get this right from the beginning. Actually, last week we were talking with was Patricia Thane about privacy, like you have to design with privacy first. And that's something that people are starting to think about, we should start designing with knowledge first, because that's going to help us avoid so many pitfalls going forward. Shane, how did we do on our takeaways, anything else?

Shane Gibson: The one thing I forgot to mention is, there are lots of patents from the software engineering world that are useful to us. One that's really useful is the use of personas. So we take a persona approach to who our customer is, we will find personas that are data literate, and we will find personas who aren't and they often drives the type of product we deliver to them. So persona mapping and definition. So those procedures with a data lanes, I find incredibly valuable as a pattern.

Juan Sequeda: And again, when you start to treat data as a product, just bringing the product thinking, you're you must be thinking for what are the personas. So all right, Shane, very quickly back to you. One, what's your advice? And second, who should we invite next?

Shane Gibson: So my advice, if you're not having fun, you're doing it wrong. So either change what you're doing or how you're doing it. And that can apply in your personal life as well as data world. So that's the advice I suggest. Who should have on, so when I started my journey, there are a couple of books I read. One of them was written by Ralph fuse, and one by King Kalia, which were the first books I could find that mashed up data and Agile. But that was a long time ago. And I'd be really interested to see if they're still playing in the space and whether they're iterated on the work. But the other person that I really would like to see on the show is Lawrence Corr. So Lawrence, again, wrote a book that I use from day one called theme. It's all about understanding how to understand a core business process using who does what, and mapping it from a data point of view. And it's a pattern that I've used for the last eight years. And recently, I encourage every team I work with to use it. And Lawrence is awesome. So Lawrence is probably the one I'd reach out, because I know he's selected in the space.

Juan Sequeda: Awesome. What's the name of the book?

Shane Gibson: ‘The Business Events Analysis and Modeling’. Now, Lawrence and I disagree about modeling technique is very dimensional focus and are very data focused. So you only read the first part of the book because the second party just talks about dimensional modeling all the time. And he's so wrong but the first part is good.

Juan Sequeda: We'll post in the comments. So because data Council Austin is in Austin. We're actually taking a break. So we will not have a show next week. So you can take a break from listening from us. Listen to another podcast, but come back, and write and review and all that stuff. But remember, we're going to be in Austin. So if you're coming to Austin for data Council, please let us know. We're going to have a special get together on Tuesday, March 22 at 7pm. Find us on Twitter on LinkedIn, send me an email. I'm Juan@data.world. The following week, March 30. We have Bob Moglia, who is the former CEO of snowflake, an awesome, individual who's just a leader and we're going to be talking about why the future of data is knowledge. And Shane, thank you so much. Fantastic discussion today. Thanks to data world for supporting “Catalog & Cocktails” always. Thank you, have a great rest of Thursday. You're in the future and enjoy. St. Patrick's Day.

Shane Gibson: Thank you and remember, as you said, all roads lead to Austin.

AgileData reduces the complexity of managing data in a simply magical way.

We do this by combining a SaaS platform and proven agile data ways of working.

We love to share both of these, the AgileData Product cost a little coin, but information on our AgileData WoW is free. After all sharing is caring.

AgileData.io

Keep making data simply magical