Shane Gibson – Making Data Modeling Accessible
TD:LR
Early in 2023 I was lucky enough to talk to Joe Reis on the Joe Reis Show to discuss how to make data modeling more accessible, why the world’s moved past traditional data modeling and more.
Listen to the episode or read the transcript.
Listen
Read
Joe Reis: Shane, what’s up man?
Shane Gibson: Hey, thank you for having me.
Joe Reis: Anytime. Where you dialing in from?
Shane Gibson: I’m from a little place called Paekakariki, which is in New Zealand. So for the audience out there that can find a map that has New Zealand on it. We’re at the bottom, area of the world, these two islands, Top Island called North Island, funny enough and a bottom island called South Island. I’m at the bottom of the North Island on the left, on the coast there.
Joe Reis: That is a very New Zealand sounding town as well.
Shane Gibson: Yes we have some cool names.
Joe Reis: I can’t pronounce any of them. They sound cool.
Shane Gibson: The one that’s really interesting. Is that the ones that start with WH, because we get people coming to visit and you say we have a town called Whakatane because the WH is always an F. So it’s quite good.
Joe Reis: That’s hilarious. I guess, I’ll have to make it there just so I can go on a tour of us, strangely named towns. So it’s awesome. And you are, I guess, is they say at the ass-end of the world to which I think gives you a unique perspective, right? You’re not the thick of things like, as we are in America. So it gives you a maybe a different, vantage point to view the world, right?
Shane Gibson: Yes. I think, we have some benefits and we have some downsides. So one of the benefits is typically we get to pick and choose the ways or the buzz washing that happens that turns up in New Zealand because, we adopt patterns from the US as much as we do as Europe. The second thing, I think that’s to our benefit is we’re really small country and because of that we have to be quite Innovative. So we’ll tend to pick and choose and test things probably a lot more than we see overseas, downside being is just our critical mass, for me, to go to a conference and actually talk to people, it’s typically somewhere between 24 hours and 36 hours of traveling. So it’s a little bit ownerís. So we don’t tend to meet and greet people as much as the rest of the world. However, covid-19 was a blessing for us in terms of everybody going remote, remote conferences as being big change to me and way of interacting with people like yourself.
Joe Reis: Yes, it’s supercool. I don’t think we would have met if it weren’t for a slack and social media and COVID. So thank goodness for that. But it’s interesting too because I see you on the data quality Camp, slack off and we would see chats, we try it on a lot of the same threads, I have a lot of similar viewpoints which is pretty cool and also echo chamber, but it is what it is. That’s what happens when you’re super cool I guess. But one thing I think that we wanted to talk about to is data modeling. That’s the topic you have a lot of strong thoughts on this.
Shane Gibson: I’ve been in the data domain for just over 30 years now. And I can break my career up into a bunch of decades. And I’m not technical. The way I talk about it, I can’t code, won’t code, don’t code. And data modeling has always been one of those things. It’s annoyed me. It annoyed me because it became a dark art. Whenever we were doing a project back in my consulting days. Sometimes it just worked the modeling process. And it was awesome. And sometimes it was an epic failure. Sometimes it was done quickly. Sometimes it was done slowly. And it seemed like there was this priesthood of data modelers that would go and sit on their own, in a dark room for six months with a whiteboard and come out with this enterprise canonical model. And even that word canonical. I mean, how unfriendly is that for the whole world? We have a canonical model. And so I spent quite a bit of my career focusing on actually what patterns are out there to make data modeling accessible to people who aren’t data modelers? And so that’s always been one of my focuses is that idea of how do you take something that’s complex? And then how do you help people do it? But do it with rigor? Do it with patterns. So, I tend to get opinionated every now and again. So this whole idea of an analytics engineer, we’re moving to decentralization and self service, which is great. That’s what we want as more people be able to do that work. But we’ve gone to the Wild West again, which is basically just go write some code. And we know what happens. We know in the first six months, that’s fine. You just write some code, you’re fast, you’re agile. Two years later, you’ve got 3000, blobs of code, and chaos reigns. And as part of that, I’ve become really fixated with language. So when DBT talks about a model, which is effectively just a blob of code, that actually makes me quite angry, because we’re degrading the practice, that is modeling. And modeling is important. So it’s about balance for me, it’s about not spending six months to a year of one person building this enterprise canonical model that nobody will ever implement. But it’s also not about, five people hacking code, with no thought for the design of that data upfront. And so balance is hard, but balance is important.
Joe Reis: It’s super hard. Yes, writing a lot about that, too, sort of the spectrum of, formal and strict versus relaxed and, fast maybe, but I could also make an argument that if you’re formal and strict, you ingrain those patterns, you actually move faster at a certain point too, if you can get over the initial hump of, like you say, the canonical model and all the ceremony that goes in behind that stuff. I think there’s definitely tradeoffs, for sure.
Shane Gibson: Yes, and I’m interested in your view, because, often we see a data model described as thing as a thing of as thing of a thing, which is the most flexible data model in the world. You can actually put any data into those four things. But to understand it, as a business user, it’s a nightmare, even as a developer or an engineer, to understand the relationship between the typing of things. And then subtyping, of thing of thing is incredibly complex. So then we’re saying, well, as a human, you have to have this massive amount of cognition to be able to look at that pattern, and then bring it back into a business pattern and describe it. So I mean, I’ll be ready too, when you publish, to see how you describe formal versus informal, because that’s a really interesting categorization for modeling.
Joe Reis: Yes, I’m looking at data modeling right now as a way of just organizing and standardizing data to facilitate, useful and believable information for both humans and machines, because it’s not just humans that are using the outputs of these models anymore. It’s also machines. But the way I’m kind of describing it too, though, it’s supposed to be cross sectional. So it’s not just for analytics. It’s also for software engineers, and machine learning engineers, and so forth. When you try and think of a model, from that perspective to, we can talk about this in a bit, but you definitely want to separate that higher level concerns for conceptual and logical modeling versus the almost infinitude of physical modeling now, because it’s used to be modeling at a physical layer, was just for databases, but data scientists, they might be using a notebook, for example, and they might be hacking on files, they might be hacking on the other thing, or a machine learning model may be made from text right now. So if you’re using generative AI and transformers, it’s like, well, that’s not really a database, like, what is it? But then data has to still be useful and clean. So on the output of that as a model too, so it’s a lot more complicated, I think, than it used to be to.
Shane Gibson: Yes, I think the use cases are far more complicated. And before we used to ignore them, we used to think about, the modeling we care about as a dimensional model, star schema and a warehouse because we want to hook up Micro Strategy. And then they would do a snowflake. Because Micro Strategy preferred snowflake schemas over star schemas. So for me, it’s really interesting around that physical modeling, because in the past, it was always based around constraints. We would model based on the thing that’s going to consume it like, market strategy would do snowflakes, Power BI, there’s this whole argument around, direct query versus and indirect query and Power BI or and Power BI only likes dimensional models, and I haven’t tested it, I think it’s bullshit, but I actually haven’t tested it. So I can’t say whether I’m right or wrong. But we tend to the model around those constraints, we would have on premise databases that were really expensive, the oracles of the world. So we would model in a way that reduced the amount of data we stored because otherwise with global the CPU, and one of those constraints have gone. So what we now need to be modeling around is around usage. And your example of a ML engineers really interesting because we think about what do they typically want, they typically want big wide denormalized table with a bunch of columns and the snowflakes, ones or zeros. In the past, it used to be, well, here’s our star schema, you go and then create yet another model on top of that, which gave you that format that you wanted. I think we’re at the stage now where actually from a physical modeling point of view, we can start to be lazy. Because we have the benefit of scale at a cost that we haven’t heard before. So we should be able to deploy a star schema, Data Vault model, an activity schema and feature store as physical representations of the same data. With one click, that’s where we should be. And then the questions, it comes back to not the physical structure of the data, but the language, what language do you speak, if you’re a machine learning engineer, I speak a language of wide tables, we have lots of rows of raw data that’s got feature flags. And so for me, data modelling is actually going to become more and more about shared language than the technical implementation that we focused on for the last couple of generations.
Joe Reis: That’s a really interesting point about the shared language, I didn’t thought about that actually, like that. If you zoom out, this is a thought experiment, I don’t think a lot of people have really done, I think maybe you have, but thinking about data cross sectionally, across the data lifecycle, really, and the different ways that it’s modeled, but the shared language is really the only glue that holds all that together at the end of the day. So if you have an ocean of a customer at an application layer, and then report at a machine learning model, I mean, hopefully, these are the same notions of say, a customer, for example, that you’re trying to do something with, whether it’s reporting or predict on it, or, they’re using the app, for example.
Shane Gibson: Yes. So in 33 years, I have never seen a customer and data, I’ve always seen a word in front of it, marketing customer, risk customer, finance, customer, active customer, inactive customer, active marketing customer, there’s always a word in front of it. Because customers is too broad a definition. And then what I always encourage people to do is then pick one of those, they get terms, active marketing customer, and then say, the alias for that as customer, we all agree that that definition is actually the definition we’ll use whenever we use the word customer on its own. And for me that again, it comes back to that language. If your language is not clear, it’s just as big bucket a customer. Once you have a supplier that buys off, they are supplier and a customer, rather customer? Are they some quasi customer supplier hybrid? So that shared language is really, really important. And without that, how do you model? How do you put a conceptual or visual representation or even a physical representation of that data? When you’re using different language at the beginning?
Joe Reis: What’s really hard, especially, when you take into account that data is now starting to be shared, not just within an organization, but across different organizations? And when you have the name, customer, like what do you mean by that exactly? Especially between organizations, it’s hard enough to use the word customer very intentionally for the reasons you described, or there’s just a million different ways of describing it, and then you started looking at it across the lens of several different organizations. And what do we mean by that? Or what do we mean by sales? I don’t know. Right? So it becomes a very interesting exercise. But, modeling is just one of these things where it happens if you’re not intentional about it, and happens if you’re intentional about it. So what kind of model do you want? That shared vocabulary is definitely a very, very, very strong notion. For sure.
Shane Gibson: In the data world, we love three letter acronyms. Now there’s a language that is designed to make it a priesthood. If we work in data, we typically know what it is. It will be interesting. It’d be really interesting to do a survey to see whether the new wave of analytics engineers actually understand what ECD-2 is, and why it was there, and also question, do we need it anymore? It was a schemas. It was about breaking change records out from the fact because joining all those tables together in one go was so big and expensive that the databases at the time couldn’t handle it. So we compartmentalize we said, well, if we break things out into dimensions, then actually we can sub query. We can say these three things, customer, order and product at a point in time had a relationship. That’s the fact. And therefore, we don’t need to worry about location or store or channel at that moment for that query, and therefore the query was smaller, ran faster on our databases. That’s not true anymore. We’ve got a customer where we have a table that is just hitting 300 million rows. It’s completely denormalized. It’s got 300 columns. It’s got a bunch of text in there. But because it’s based on a column, a database, whenever you’re hitting it, you’re only hitting five to 10 columns and the database takes care of that complexity. For me, I don’t have to worry about that physical modeling anymore. As much as I used to write, but still have to care, especially when you get the bill. So a lot of our modeling techniques have been designed and based around constraints that we had 20, 30 years ago. And I’m not sure we’ve iterated them, which is interesting.
Joe Reis: Back to hear your bed of whether, an analytics engineer these days would understand the different types of STDs, I would wager that they are not familiar with them. And this is based upon, talking to analytics engineers and seeing how they’re learning about data these days. I suppose anecdotal stuff too. Well, it’s all anecdotal evidence survey yet, but, some friends of mine had given a talk at coalesce DBT conference on data modeling, and for other friends, my understanding in the back of that Conference actually, and watching people type in like, what is dimensional modeling into the Google bar? And think about that for a second. This is, DBT, wonderful product, great company, and so forth. And I think they’ve done a remarkable job, getting the mindshare with whether doing, but this is also data transformation for analytics. So, historically, that would be, Kimball or some approach like that. And now it’s a foreign concept, perhaps, could be wrong. But I think so.
Shane Gibson: I think it’s also regional as well. So, in Europe, we see a lot more databall. So I was around in the Kimball versus Tim Campbell and it was interesting people say that Kimball didn’t actually have a war amongst themselves. And that was probably true. Didn’t know them at the time. And not sure. But actually, for us, at least in New Zealand, and the rest, and I think the rest of the world we were warring. I think it was Tim Campbell when he was Team Admin, and we were scrapping it out on every project. It was like, what’s our architectural design for data? And then we’ve seen that to a degree between dimensional modeling and Data Vault. And what’s really interesting is, it becomes a religion, becomes a patient. And I think data vault and even within Data Vault, there’s actually two teams within Data Vault. Team Dan, Team Hans, and on the team Hans, we get this passion around these things that we’ve got good at doing. And the rather than trying to encouraging people, we try and force them, we become derogatory. Like, you idiot. Yeah. What are you doing when you’re not doing type two, on your don’t. What you don’t even know what that three letter acronyms from? And so for me, I think the failure of adoption of data modeling or what I call data design in the market right now, is not the fault of people like DBT, who haven’t encouraged it. It’s actually the fault of us as data modelers, not being good at teaching other people, why the art and craft that we do is valuable. And so if you watch the vitriol that tends to happen on social media, we tend to go quickly into. You’re an idiot, you don’t know what you’re doing. And I did a presentation of the virtual conference last year. And my presentation was how do we help analyst data model? That was the focus of it, about three presentations. And somebody else, basically did another presentation, which, I would class, how would I describe it? Theme polite. Here’s 10 Examples of 101 Mistakes other idiots have had when their data modeling. So they shouldn’t. And the one that got me was the presenter brought up a slide. And it had a person in a bar playing a guitar. And he said, everybody can play a guitar, but they can’t play it well. So if you’re not a trained classical guitarist, you shouldn’t play guitar. And for me, that was like just the complete opposite of what I said in my one, which is people can model and there are older things need to learn and it is risky. And these times where you want an expert to come in and deal with a level of complexity that somebody who’s not really, really experienced can’t. But actually, we should teach everybody to model when 90% of the cases. And so for me, I think that’s part of the problem, is that there’s a whole lot of complexity in there. And maybe we’ll come back to this idea of the data value stream I call it but the areas and the way we move data and the different modeling techniques and why there are different ones, which introduces complexity. But I think as a data modeling community, we need to look at ourselves about the way we engage and make people feel and then ask that question, why would somebody want to talk to us when we basically don’t help them understand what we do?
Joe Reis: I completely agree with that. That’s one of the reasons I’m working on the book I am, it’s more taking the approach of Bruce Lee with the G condo or something like that, where it’s very multidisciplinary, as you say, value stream, I think is a really good example of that. There’s different techniques. I’m not here to tell you what to do. I think you’re valid, you’re capable of making an evaluation of your tradeoffs, based on your situation. But just at least be aware of these are techniques that you could use if you wanted. That’s the approach I take. I do agree with you. And your assessment that there are certain factions in the data modeling world that are very dogmatic, and I would say borderline toxic in terms of how they treat people, I see a lot of the same discussions you’re talking about online. And, frankly, then you wonder why nobody wants to talk to you about data modeling. Because why would anyone want to even have like a simple discussion with you about anything? All you’re going to do is just criticize them and put them down and call them a loser. That’s, nobody wants to deal with that.
Shane Gibson: And yes, some of those people I’d call Friedan’s. And, the way I like it is they’re not doing it to be assholes, most of them. It’s not done on purpose. I remember, back in the days, when we had on premise databases, we always used to have a DBA database administrator, there was always somebody that showing the Oracle or SQL server room, or Informix database. And they would typically, and sorry, to all the DBAs out there and lovely, but the ones I dealt with were defined as the word toxic, because they were really experts, and they were always busy. And they were always grumpy. And you go to them, and you go, again, don’t code, won’t code, one of the team was working with the queries are running slow, like, go to the DBA. Go, my query is running slow. So basically, open ended question, I have a problem. And typically the answer they get back as well compared to what it was like. And then it was like, Well, I ran it three times last week, because you learned. And, you know, we got a one minute response time those three times and, now I’m getting five minutes. So there’s obviously something wrong with the database. So now we see up this friction. It’s your database. And then the answer team to come back was, well, it’s not my database, it’s your code, and then rapidly go into the database and tune it, the DBA, because most DBAs came from a development background, there was the path that was great developers, and they became really great DBAs. They didn’t go into the code and rewrite the developer’s code, so it ran faster. And again, it’s not because they were assholes on purpose. It was just the way they worked. They were detailed people, they wanted to solve the problem. And for me, that’s what I see in the data modeling community. People were trying to be helpful. It’s just the way that they engage is out of alignment, I think of the way that people who want to engage on the other side want to be engaged with, if that makes sense.
Joe Reis: Yes. Makes a lot of sense. I’m looking at one of my books right now, it was 50 years of relational databases, and there is database writings by Chris date. Smart, dude, good book. Every page is also full of vitriol about, like, if you’re not following the relational model, then, you’re just not a great person, it seems like every page I read in the book, I feel that way. And it’s a very interesting take. But I personally haven’t seen any business, for example, failed, because they didn’t, adhere to very strict relational modeling practices. But if you read through the book, and it’s like, well, if you’re not doing this, and you’re building your entire infrastructure on shaky foundations, and well, I don’t think I’ve ever seen a business fail because of that. I think I’ve seen it fail because of really poor business decisions. But you’ll read what you want into the world. So lens you have is the lens you have.
Shane Gibson: I think for me, there’s a bunch of core foundational patterns and you should apply them when you can, because they just need to see. And when you decide it needs to be a conscious decision and you need to be cognizant of the fact you’re cheating. So we’ve had this breather of no SQL database has turned up prior, and it’s now the cool thing to go and build an application with, and this mess has been from an agility and speed to market have been able to do that. But you don’t get primary keys. So that’s fine. You get speed and agility to build that application. That’s great. But when you try and use that data, you’re just moving the cost somewhere else. So now somebody’s going to look at the data and go, Hey, where’s my primary key because it used to be a great hint to say, this thing’s a thing. These two things are related. So you move that work a little bit. And, so I agree with you and I can’t think of an example where apart from data breaches, where I can say that organization failed because of the data modeling. But I think I could point to lots of organizations where the costs and the speed to market were way higher than they should have been because of bad data modeling practices, or trade, where it was put a different way. Tradeoffs are made without a conscious decision, you’re making the trade off? That’s probably a better way of putting it. So if we go back to that layer problem, we have a habit in the data world right now. And we have for 30 years, and we haven’t changed it. Then as we move data and add value to it right into that value stream, we tend to want to adopt a different data model technique at each layer. And maybe we have to run. I still haven’t thought about it enough to go, do we have to, because there is not one data model approach to rule them all. But we don’t tend to have reuse as much as we move across the layers for some reason. And if we think about application development, it took me ages to figure out what the hell this thing my UX designer kept talking about, which was a design system. What the hell is that, and really, the way I boil it down to now it’s reuse. When we do a UX piece of design in our app, that design is reused time and time again, where we can, where it makes sense. Because it reduces our cost, it reduces the cognition of the user to understand what’s on the screen. In the data world, we don’t tend to do that. We don’t tend to design something and then reuse it in many different places, that tends to be a one and done for some reason.
Joe Reis: Yes, whereas front end is all about reusability. Because it’s just different components. And if you’ve ever written a React app, for example, it’s like, literally, the notion of it is reusable components. And I think it’s just a matter of like having empathy, or at least understanding sort of what, people on the left in the right are doing, you being in data, for example. So upstream with your application developers, like, how are they using data, they have a very different use case for it and say, the analytics person or the ML person, my take on this is that there’s just depending on where you are in the lifecycle, or the value chain, you’re probably going to have to think about, at least a physical implementation for that use case there. And that’s about it. Really, there’s not a one size, it’s not like you’re going to tell them, you need to put everything into a dimensional model for your front end, like they would stare at you very strangely, and then walk away, or at least I would like you don’t know what you’re doing. And I’ll see you soon. But same thing, right. So.
Shane Gibson: Yes, isn’t that the definition of data mesh, which can take the complexity of data, modeling the consumption of the data and push it back on that for software engineer? Who is kind of busy anyway, right?
Joe Reis: Yes, we can do that, too. I actually had to give a talk a couple of months ago, at a Starbucks conference about it was a debate on whether data mesh would do away to data engineering, and I had to take the side that it actually would, which was partly trolley, for sure. Max, a really good friend of mine. And so I asked her, why do you think this is right? Because it’s in her book, she comes from a software background. So she has a different perspective, she’s not a data person, at least like you and I are. And so I think it was intriguing her perspective, or she feels that at the end of the day, maybe it is all software all the way down.
Shane Gibson: I agreed with it when she is right. When she first started writing, I was 100%. I didn’t think we could achieve it. But I thought it was a great vision. What I’ve seen is a massive softening of that stance. So for me, it’s not about taking a data engineer or a Data Modeler and moving them as a separate role back into that team. It’s about taking those skills that we have as data modelers. And enabling the software engineer to do the work, because we know they’re so busy. We can’t teach them the skill and expect them to do it because they just don’t have time and they don’t had the interest. What we need is, I still think there’s this idea of as we do in software development, a set of libraries that take care of that complexity for us. And I think that’s when data mesh decentralization back to the software team will become real, and in theory is what we should do effectively, they’re creating data for the organization, why don’t they just finish the job? Why do we have to have another whole team that does the second half of the job? But the reason is, because actually the business prioritizes, getting that software out the door fast. And the argument I always use is, let’s go through the scenario, software development team have got a new feature. It’s the new cash cow feature that’s going to up the revenue of the company and the company is struggling. So they need this new feature to survive. So the software engineering team, they are rocking it, they’ve done this development, it’s taken a couple of weeks, and they’re ready to push that feature into production. And then they go to the product owner, hey, I’ve got the feature done, we’re all good, ready to go to production, just need another two weeks to do the dimensional model on top of it, so it’s usable for the rest of the organization. Now, the business, the product owner, is going to go screw that push it, because I want that business value. And I’ll wear that technical debt or wear that cost later, because they don’t actually wear it themselves. That’s the problem unless we automate it. And no matter how we do that, then we’re just putting more low competition and effort on the software developers and the business will make the tradeoff decision they always make which is, I have to get value or we have some risk and cost of doing it.
Joe Reis: That’s really interesting. Have you ever thought of any ways to do incentivizes a business to take data more to account at the software level?
Shane Gibson: I’ve spent many years trying many different techniques, I’ve used carrots, taxes, I did spend a lot of my time now, and apart from our startup coaching, data and analytics teams on agile and product practices, is how we blend them together. So we’ve done things like technical debt registers, we’ve done a whole lot of stuff, we did one where we started but we didn’t finish it, which It was really interested if we treat data as an asset. And what I mean by that is we actually put it on the balance sheet and we actually depreciate it and that depreciation goes back into the data teams to actually fund them to do the work. Would that change the way the organization behaves, and Doug Lane has done a lot of work around that, that idea of monetizing data internally for the benefit of the organization. But, my experience, it doesn’t matter what you do. For some reason, business owners make a trade off decision and data for consumption internally comes last. However, I’ve never worked in a fang company, the big Facebook’s alphabets and apples of the world. And, they seem to have data as a core of the organization. So I do a podcast with a guy out of Australia. And we have, product experts and agile experts come on. And what’s really interesting is I’ve started over the last 78 episodes, there’s a bunch of things that have come through, because I sit back and I’m looking for patterns. And one of the ones that’s really interesting that’s been described a lot is that organizations started after 2000, 2010 are inherently agile by default. They don’t do Scrum. They just work in an agile way. They have growth mindset. And, once before that, the Hurricane, they have organizational structures, and Conway’s Law and all that stuff. And so for me, I can almost infer, and it’s an inference. I’m guessing here that the post 2000, 2010 companies are also data driven. It’s just what they do. Every decision they make is typically made based on data. Everybody before that is made on hierarchy opinion, the hippo in the room. And so I think maybe that’s where the markets lying, as we’re trying to change these behemoth organizations to try and change the culture. And we won’t, I was talking to somebody the other day, and they said, big corporates are trying to invent data mesh, and it’s like, well, they’re never going to, you need a Greenfields environment, at least get completely decentralized on day one to work. So we’ll see, can we turn those big behemoth chips around? I haven’t managed to yet. I’ve done small changes. Every now and again, the red unlock some one step left, and we do some good practices, but it’s hard.
Joe Reis: It is hard. And I know some people that are saying they’ve done data mesh at larger companies. But it seemed like was really a difficult part. And then the way they accomplish data mesh, it seems like a lot of it was data sharing type stuff, so it’s not exactly like that. I’d say the purest form of mesh as it’s described in the book. But it’s baby steps to get there, I guess I don’t know, if we’ll ever hit the ideal of true data meshes. GMAC describes. I hope our company is really successful in making it happen. So she’s a good friend of mine, and I’m rooting for her. So, TBD, I guess it’s interesting, because you’d almost think that it would be the same revolution that the micro services revolution, and how that happened. But I think where you’re hitting on it, data is not just that much harder. And that’s the crux.
Shane Gibson: And that’s the universal question I’ve yet to answer. The best I’ve got so far is the team, the data team that are working with the data are not in control of the data. And I like it because I’m really upset on data mesh and crux because, I love the whole analogy of recipes and food with data. And then he got out of the way and gone down the whole data kitchen routes as like, you’ve got the fisher. So I keep going back to it, because it just resonates with me. And so if you think about a data team, what’s happening, we’ve got a bunch of really talented chefs in the kitchen. And they’re really experienced, amazing, awesome meals, can we got a bunch of customers out the front of the shop that want those meals. They want the data, the data staff. And, what happens is the ingredients that we need, get shipped to us and put in our storeroom, there on the loading dock. And we go out there and half of its rotten. We’re going to make this beautiful tomato terrazzo pasta. And I go out there and my tomatoes are rotten. What am I going to do about it? How do I make a meal out of that, and that’s the data problem is, when you’re a software engineer, you’re in control of the way that data is captured. And I think one of the changes that the whole DevOps thing and why it was so successful, is we enabled the software engineer to be in charge of their own kitchen equipment. And so therefore, the thing that was out of the control at the time, which was, the servers and that kind of stuff, we containerized it, we micro servers, that we gave them the ability to do it themselves. And that removed both the reliance on other people, so the time to market was much faster, but it also allowed them to bake in good quality. From a data point of view, we’re still stuck with somebody shipping our state, even if we go collect it ourselves, but they’re shipping US data. That’s crap. And yet our customers want a beautiful meal. And that’s why we have data teams. And that’s why it’s so hard. And then just the complexity of the rottenness of that data, it’s hidden. I look, the data looks so good. But actually, when you get the context, I now have Paekakariki in the South Island of New Zealand, which is just wrong. So rather than a rotten tomato, what I’ve got is a little bugs that sitting right in the corner, and it’s not until I’ve chopped that puppy over and I was going that bug is.
Joe Reis: That’s an interesting take, I haven’t actually thought about that. Like you just said that lack of control over the outcome, but it’s definitely true. You’ve worked as a data person that I can’t count how many times I’ve felt like I was in control of the data. You’re on the receiving end of it most the time and, as you’re working in a kitchen too or you’re working at a restaurant where the customers, their tastes change, they might want pizza. And they might want chicken wings or something. And I don’t know, if you serve both, maybe you do. They change their request every second. It’s like, well, actually, I want Chinese food right now, actually. So you have that.
Shane Gibson: But we’re getting there. So the idea of data domains. The idea of having teams that are bound by domains, is effectively solving that, I remember I was in Dallas, pre COVID. And I went to a barbecue over there. Oh, my God, I loved it. But we were in Vegas, and we had one of the most amazing Chinese meals I’ve ever had in Vegas, where typically what we say is the worst food in the world. So this idea of domains. I know when I go to a Chinese restaurant, I want Chinese. I know when I go to a barbecue, I’m going to have barbecue, there’s a very clear indication of what you’re going to get. That’s what data domains give us. I think, agile ways of working gives us the ability to change, we should no longer do six months requirements upfront, because we know we’re investing in a recipe that somebody might not eat. Well, they might go too much paprika. I’ve spent six months doing that recipe. So we get some techniques from there. I’m a great fan of Lawrence cause work around them. I use it all the time for the last 10 years. So we have ways of doing small requirements, work upfront in a language that business users understand, We can describe the difference between chicken and beef and get clarity or vegan or not vegan, it comes from an animal, it doesn’t, when we manufacture animal protein from DNA, which we’re doing now is going to be a really interesting conversation. However, before we get to that complexity, and so we look at the front end, there’s actually lots of patents out there. Now we can adopt. And, then we look at the friction, so we can take Lawrence’s beam methodology or pattern. And I can convert that to a dimensional model or data vault model with no with it. Because it’s just the language that’s used, matches the language in terms of instructions and sets. And then we look at the quality problem, and we look at the stuff that sheds doing around data contracts, so what does a restaurant do to stop the problem of rotten tomatoes? Right? They have an agreement that they’re going to get good quality produce from the supplier. Now they have somebody sitting there going, these tomatoes are sucked, take them back. So the idea of a data contract. And not just contract or schema, but contract or frequency and freshness and the context of the context of the data. That makes sense. Again, we’re starting to get those techniques, we’re starting to get those patents. But nothing in the data modeling space. Really, there’s no innovation or iteration happening. Activity schemer, was an attempt, and stamps.
Joe Reis: I feel like a lot of the honors is going to be on software engineers. The book I’m writing is actually geared more towards software engineers, it’s not for data people, per se. Because if you want to address the root cause of a lot of these issues, you’re going to have to start at the source. And it’s at least making engineers aware of these, again, no sense of reinventing the wheel, relational modeling works fine. If you use it, understanding how to properly use a document database and model data, that works fine, if you know how to use it. Again, what I see with software engineers, they interact with many is that they do what they can to hit their sprint. Then a lot of cases, they’re coding in maybe environments where there’s an ORM, an Object Relational Mapper, too. So I’m not sure if you’ve ever used one, but it is astoundingly easy to introduce lots of fields into your database, lots of duplication or more, at the drop of a hat, I need to add a field. And so this form can work. I’ll just do that. And the downstream effects of this is. Well, it’s kind of what you see right now. So my take is that, how you attack the data modeling thing upstream is simply by working with software engineers, to at least reintroduce them to a lot of these practices, and, make them aware of the tradeoffs, and make them understand to the feedback loop between what they’re doing and data and data coming back into the app. That’s becoming more of a real thing, too. And you can choose to code in a silo, but there will be impacts to you. Maybe not directly, but at least immediately, but there will be. So, as ever, price becomes more data driven.
Shane Gibson: Yes. It’d be interested in whether we actually get an automated semantic layer on this stuff to say that both sides.
Joe Reis: That’s right, I always thought the semantic layer should have started was at the application end, it dealing with, on the process junkie, too. So we’ve a Value Stream Map out, where all the waste occurs, where it starts, you won’t address the root cause. I completely agree the semantic layer is, the most asked backwards place possible to be frank, it’s not forever reports but doesn’t have problem.
Shane Gibson: If we take the comment, you just see it. A key takeaway is, when I go and work with a new data and analytics team, one of the first things I get them to do is actually describe the value stream, who does what? Where’s the data flow? What tasks are done? Who does that work? And we visualize it. We do nodes and links on the screen as a graph. And then we figure out where the wastage is, where the constraints are. And you’d be amazed how many teams don’t understand the flow of work. And they don’t think of it as a factory. They don’t understand if you say to them, you’re a factory. Every time a piece of data moves and a person touches it, it’s a station in a factory, map it out, you go read the Phoenix Project, map it out, where’s the wastage? Where’s the problems? Optimize it, you make three changes and you make yourself far more efficient and a weak because you know where it is, you just haven’t visualized and agreed, you’re going to have to change it. So some of the simple techniques are there, one, it’s about that you can make your point. The software engineers are incented to make changes as fast as possible to get business value and need to rule that really affects about that. But they don’t understand the downstream impact that the other team has. An example is a customer of ours, building out their own app, they’re a startup, no SQL database, so good. They come back to us. And they give us a combination of snapshot files data, so they push the data to us. So they give us some snapshots. And they give us some change data. And one day, they came back and they go, the numbers are wrong. The thing is a data person you always love to hear, because you go straight into that DBA mode compared to what sounds like okay, what describe the problem. Okay, We have a bunch of customers in the dashboards that aren’t in our application. All right, excellent. And so we go, but you’re not sending us change deletes. You’re sending us snapshots. So we don’t know that customers disappeared? And they like, so if we just send you the deleted records, when we delete them, then it’ll be right. And it’s like, yes, so we need, but they didn’t think about it that way. Because you’re not data people. Yes, why would they? Yes, it’s like what’s gone from the app, like, problem solved. Another one, the one that really gets you is, so because there’s no SQL database, there’s no keys. So we had a key field, it was keyed off of a good, and then they need to make some changes. So they changed that, effectively, that structure. So changed the grain of the data. So they retype the good. And change the grain of it. So now it became a parent child. And for them, it was fast. It was like, “Yes, we need to make this change for the best way of doing it from them, bang, the whole new no SQL database really optimized itself”. And now we’re starting to get this feed with a key that is no longer single grain, but as a multi grand key. And again, it does bad things to you. So it’s like, go back, something going, “Look, it’s fine, you need to do that, you need to be agile and flexible and able to do that.” But here’s the downstream impact. We need to work on those ones to give them and we can’t automate that yet. So a lot of it’s that conversation that language and helping people understand what we do, and why what they do or fix it. And then what can we do to reduce the blast radius of that change? Because that change needs to happen, where we can’t say, you can’t add column to your database, when you secure your database.
Joe Reis: Well, that’s what I think our value stream mapping is incredibly helpful. And I wish more companies would use this and more teams. Because if you focus on the end product of what a customer wants an external customer, the end customer, not some internal facing one, but then you start seeing cross sectionally. If you can follow how data moves, information moves, which is one of the things in value stream mapping you do. You get all the stakeholders, software, engineers, data, people, business, etc. I think that accomplishes a lot that shared vocabulary you’re talking about as well as it has empathy for the end product at the end of the day. I don’t think anyone’s doing this stuff maliciously. It’s just, you’re incentivized to focus on what you’re paid to focus on. And if I’m not paid to care about the data team’s needs, then why would I? I have enough things to do right now. And that’s perfectly understandable.
Shane Gibson: Yes, product owners up the bums, they will move faster. And really good at iterating the way of work to reduce some of that manual stuff. So there’s lots of patents in software engineering world. And there’s lots of patents in agile world and lots of patents in the product world that we can adopt. So for example, personas. We should be using personas for our users for data, and understand the different types of users we have. Well, there’s machine learning engineer, they want the data in a certain, they want raw data. Again, you look at the medallion architecture from data bricks, which is just a copy of a three tier architecture we had 20 years ago, but good that it’s getting regenerated, that raw, combined or cleaned, and consumable standard architecture. And we used to have this argument 20 years ago, which nobody’s allowed in raw, nobody can trust them in here. And then you get back there was a Data Miner, and they’re like, but I need the raw data to do behavioral modeling, because I need some flags to say this person bought five products versus this and what didn’t, and what they bought. And so we see those data problems turn up. And we’ve lost the art of data modeling, because we’re not teaching it to the new generation that are self-serving themselves. And given the software engineers are now using no SQL databases that don’t need a relational data model. What are they using? When somebody designing an application as a software engineer, they still using a mirror board whiteboard, whatever they write just to draw out a structure of the data conceptual model. Or are they just ripping in and writing code and refactoring it as they go now?
Joe Reis: I think even if they do the white boarding upfront, it’s maybe 50, 50, when they’re starting out. It’s so easy to add new fields in, after the fact that you’re just like, “Okay, do I need to have a formal committee meeting on this, or do I just add a new field?” And so that, if you look at a lot of code, a dev shop the other week, and somebody had go on there. And you just tell like, what was happening? It’s just goofy, older, the database, not a big dealm add and it’s done. So I think it’s because it’s easy, it’s easy. So again, you just need to make the app work. And that’s it. So even if there’s like formal modeling done up front, the No, SQL makes it really different. I would say there’s not an ERD, for example. That works with a no SQL database. Lord knows, there’s like, a million different NoSQL databases to select, which, how would you approach it, but same with even nested data, maybe you’ll draw like a parent and children and so forth, but it just ends up being a bunch of squiggly lines with a bunch of, key value pairs, and that’s about it. So it’s the notion of modeling. If I’m looking at my table over there, I’m looking at COD’s book. That thing is all about just set theory and algebra, like it’s very formal. And then you might translate that into an ERD. But, maybe that’s a tool that you want to use to design your database. I just don’t see a lot of people doing that these days, or at least as much.
Shane Gibson: Yes, so let’s look at how the software engineers have reflected in practice, their way of working. They’ve got some new technology that made them faster. And then they’ve altered the practice to say there were some things we used to do that don’t have as much value now. So we’re not going to do them. There’s some consequences of that behavior. But you’re not going to model heavily upfront, because our rate of change, our ability to change and the speed, we can make that change, compared to the old days when we had a spec, which took six months. And we have a design that was signed off by the architect and somebody would go and test that the application we built the data model, and the application actually managed may match the spec, not the user outcome. And the product was changed to said, customers are king. It doesn’t matter where the spec matches, what was built. It matters whether the value was delivered to the customer. And sometimes we get it wrong, and we just iterate it and the data willed. What do we say? I mean, I’m amazed that we now have a whole category of graphical ERD modeling tools turned up again. It’s like why, like, I’m kidding. We should be data modeling, but we shouldn’t be doing D diagrams as our technique for data modeling anymore. That’s 20 years ago. And yes, they’re prettier and easy to use, AI sparks and whatever the hell that those crap tools we used to use, for data modeling. But we haven’t reflected that practice. Yet, our brethren and other domains have, and I’m intrigued, because we’ve seen massive changes in the data space. But have we really, have we seen a change in the core patterns or not?
Joe Reis: That’s a good question. I don’t know that we have, we’re still asking the same questions. I rant about this a lot on LinkedIn, it feels like we’re just in Groundhog Day with a lot of the same questions we ask. One of them’s like, what value do we add? It’s like, well, you have to ask that question. And I highly doubt you’re adding anything. So there’s that sort of English correlation between those. But as I tell data people, if you want to know where the fields going, I would say just look at software. There’s at least borrowing of a lot of ideas. I’m not saying it’s a direct one to one mapping, data is different, but data ops, data reliability, engineering, and data observability. These are literally just software practices applied to data. And so as far as modeling goes, though, I feel like maybe the equivalent to analytics engineers is really that maybe the DVT model and how they’re doing it was just one big table typically. So if it’s easier to do that, and that’s what people are defaulting to.
Shane Gibson: Yes, so I’m going to just review that so we think about starting value stream. So the data value stream, I think, one big table at the end. Consumable one big tables is what we should be going for, the technology can handle it now. But what we end up with if we do that, as we in the middle layer, and then the designing of our data is, we end up with 1000s of jobs, and 1000s of tables that have the same data and there’s no way of optimizing that as a user. I don’t care about Technology, as a user, which column holds customer, and that’s a data design problem. So what I’m going to see, I think we’re going to see, because most of our changes happen via technology, and I think we’re going to see technology. So help us solve this problem and change our practices. And one of two ways. If, especially with Chat GPT. Now, so if we think about taking those 3000 blocks of code, you’re writing DBT, and having the machine create the model for us, and then re optimize, rewrite the code to conform that model, then we now get designed data. But we’re designing as a chaos theory, everybody’s just writing stuff, but the machine is actually coming up with a design as a recommendation and then Router OS, fitting it, or we get a semantic layer, which, in the old days, semantic layers, the BI tool layer, because where the value was, things start to be pushed back. But again, they’re actually the semantic layer is the gluing of those 1000 big tables, to say, when you talk about active marketing customer, here’s a thing you heard, and we find where that field is for you, you don’t need to care. So that’s where I think technology will change our practices.
Joe Reis: It could. I’ll push back and say, as a longtime looker user, for example, when I look at old code, there’s some code that has discipline to it, and in the sense that there’s no duplications, and in some cases, I’ve seen looking at malware, it’s bananas, I don’t quite understand what just happened, looking at all which is a semantic layer to just attached with the AI tool, like others you’ve described, but it did get unwieldy. I was like, I don’t know how you got into this master, you’re looking all file almost as big as all your DBT code. That defeats the whole purpose of it.
Shane Gibson: Yes, that’s like a practice,
Joe Reis: It’s like a practice. I’m not disagreeing with you. We’re actually in violent agreement, I would say in most of this stuff, it’s like, because the same set, the same practices and happen with no SQL, where it’s let it rip, the same thing is happening in analytics engineering, and its like, one big table isn’t that fault, but it’s like, 1000 DBT models is more of a symptom of a lack of practice, because again, it’s so easy just to make a model. It’s like, Screw it, I’ll just do that. Like, why? Why make it coherent?
Shane Gibson: Yes. And so we come back to data modeling, what’s the problem, it’s not easy to make a data model. So easy to create a piece of block of code that creates data that’s valuable. It’s so easy to create, like one table that performs now and people can use, its still bloody hard to design your data. And so that’s the problem. If you make it easy. If we make it where it’s a no brainer, it just happens. And it’s low effort. We do it. It’s because it requires high cognition, a separate set of tools, a massive amount of experience and training, depending on the data, you’re looking at, Hit the data, how do you model that. Party entity? How do you model that? Manager employee? How do you model that? You look at Shopify, and you go 15 tables, it’s really easy. Now we’ve got an order where there’s a transaction, transaction has written, or refund. Refund only turns up in the data when a refund exists. So you start getting the schema of the data, you get given change. And one of our customers, they had a bunch of products on their order. And then the payment, they use gift cards, the payment, basically, was allocated against the order. But what they needed was they actually needed to understand based on the different payment types, how much of each product line was credit card versus gift card. And so that was complex, because Shopify doesn’t do it for you. So you have to basically allocate the costs based on a simple algorithm, but again, there’s complexity and what you think is just Shopify, customer orders product, customer pays for order. And that’s always my first question. I always ask somebody when we go through some training is, as a customer pays for order or customer pays for product? As a store, customer returns order or customer returns product, that design that their business question and furthers your design more than anything else more than the technology, star schema, Data Vault, whatever. For me, it’s still complex, it’s still highly will cognition and until we find a way to remove that, most people won’t model.
Joe Reis: Yes, and I guess the consequences of that are as we talked about earlier, tradeoffs. So there is that incur by it by not modeling. And what that means is you just have a lower understanding of your business for one. The highest level data modeling is not necessarily meant to come with a physical implementation of data in your database. It’s meant to come up with an agreement on concepts and then business rules and how data relates to that. So, if you don’t want to put in that work, and again, this is what I’m writing about, like I don’t really care what you do with your time, but do understand that if you don’t take this exercise seriously, the consequence is that you simply don’t understand your own business and the data that you’re reporting, and that’s online or working with.
Shane Gibson: I haven’t thought about this before, but you’ve made me think about it. So, actually I was going to answer it, and answer it this way, and this is not well formed, because I’m basically just pulling out my bum right now, but based on experience. If you don’t design your data, you’ve got a $5 million problem. And the reason is, there’s two reasons an organization rebuilds, Greenfields, their data platform. And if you look at organizations, if you think about our domain, the data work we do, let’s look at it as our restaurant. We build a restaurant, it costs us $5 million. We run the restaurant for three to five years, and then we go, that didn’t work, we burn it to the ground, and we go and build another restaurant in exactly the same place. Different ovens, different people, different patents maybe, different recipes, we go from barbecue to Chinese. So there’s two triggers. First trigger is technology. We see a technology wave come happens. The technology we’re already using is works, it’s got some problems, it has value, but we like shiny. So somebody comes in, they want to make a name for themselves, it’s like, Cloud Data Warehouses, and they are much better than what we had, but we’re going to rebuild it from scratch because of that first trigger point. The second trigger point is mad person’s knitting. Somebody comes in and goes, it’s been five years, there’s 3,000 blobs of code, whether it’s Informatica code or DBT code, or whatever the next shiny thing is, there’s 3,000 things, and it’s chaos, and we can’t understand it, and we can’t find our data anymore, we need to rebuild it. And that second one, that $5 million bill, is because we didn’t design the data up front. Lightly design the data up front. So there you go, that’s my takeaway. If you’re not willing to change your technology every five years, design your data, and save your $5 million. You come from a consulting background, right?
Joe Reis: Yes, I’ve done a bit of that.
Shane Gibson: And so have I, and what I say to organizations I work with now is, when you engage with an external party, with a vendor, first question you should ask them is, what’s their business model? How do they make money? And in the past, as consultants, we made money by doing data design because we got paid to spend six months in a room doing it. And it was a hard task, and there were very few of us that could do it, and therefore it was expensive, and we got paid lots. Now we don’t do it, we have teams of people that write lots of code, and we know in five years’ time, because we’ve seen it before, we know in five years’ time we’ll be back, or somebody else will be, and we’ll pick up the one that they did. So for me, the incentive for consulting companies to optimize the way a customer works is not there. The business model was around bums on seats in early rate. But I don’t know about you, maybe that’s just the last end of the world New Zealand thing. I don’t think it is.
Joe Reis: My business model of a turnery was always different, where it was a flat rate, fixed price, fixed scope, so I had every reason in the world to like finish quick and get out. And the other incentive I had too, when Matt and I had started the company, it was just the two of us and now it’s two of us again. We never really liked doing services, a hands on keyboard button, chair work. I felt like I had a job, and if I wanted that, I would choose not to work for myself and go get a job. So, it’s a lot easier that way. But the approach that we would take actually is coaching and showing data teams how to do what we do and really sitting alongside them, peer programming, and helping them level up their skills, and people love this because they feel like they’re empowered, they feel like they’re in the driver’s seat, and there’s always a constant tension, if you’re a services company, and you come into a client, most of the time, they’re probably going to hate your guts in some level because a higher paid version of what they’re doing, they feel threatened by you, and I felt like the jujitsu move really was, well, what if we just helped you become a better version of what you’re doing, and we have no incentive to stick around longer than we have to, so, I said people tend to like that approach. That’s how we made money, almost doing an anti-consultant type of thing in a way, but it worked for us. I can’t say that it’s a good idea for a lot of people. It’s probably a really terrible idea.
Shane Gibson: I think it’s a good idea for people, and it’s a good idea for customers, effectively it’s a coaching model, the same one that, I do when I need to do a side hustle. The problem is it’s not a good model for a consulting organization that has 50 people that can code and need to be off the bench. So go back to the business model. If you ask your consulting friends, partners, how do they make money? And they go, well, we coach your team to be as good as us, and then we exit until you need us to coach on something new. That’s the partner you want. If they go, we’ve got the experts in the market, it’s like, how do you charge for them, hourly rate, daily rate, even fixed price? Because fixed price is still time and effort. It’s still effort-based. Then run and find somebody, that’s going to coach you. Out of interest, did you coach them on data modeling?
Joe Reis: I would, when necessary. I feel like it’s one of those things where it’s like, showing people how to floss their teeth or something, some people want to learn this, and others are like, I have no interest in learning this stuff at all, just show me how to use snowflake. But implicitly what I’ve learned is, you want to sell things for the action and the outcome, so if we’re going to talk about data modeling, then it’s like, let’s think about your business, let’s think about the data we’re going to put into snowflake, for example, what are you trying to accomplish? What are the reports you want to generate, really, and work from there, work on the outputs. If you’re trying to approach it from data modeling, people think you’re just giving them a lot of extra work to do, that’s how you pitch it.
Shane Gibson: So that’s interesting one. So again, I’ve got a slide. I do when I coach teams, and I talk about these three ways you can design your data, modeling your data. So source specific, we model off the source, so we go spend some time looking at it. Output specific, so we look at what the user wants on their report, their dashboard, or the thing, and business process specific, and over the years, I’ve become very opinionated. We model based on business process, and then we map it to our source systems and what the user’s after. So who knows what? So I know we’re almost out of time, I’m going to flip the thing. One question for you. So you’re doing all this research on data modeling for your book, what is the best one data modeling technique you should use and why is it data-mode?
Joe Reis: I think it’s not a data modeling technique at all. I’m more convinced than ever, I think that actually taking a step back and doing, Valley Stream mapping and process mapping is probably the best modeling technique you can learn. It just helps you understand the flow of data and money through your organization. And I think it helps you unlock where all the bottlenecks are. If you can do that, you have a superpower, and I would say, the world’s your in terms of whatever data modeling technique you want to apply. If you don’t understand how things flow, Charlie Munger says, “You’re a one-legged man in an ass-kicking contest. You’re in a pretty serious handicap.” So I don’t know, what about you? What would be the one technique to rule them all?
Shane Gibson: I’m a data vault bigot only because it’s the best. It’s the best, and data vault 1.0. So I talk about data vault modeling, not data vault two and all the other crap that goes supposedly with it. So for me, data vault at the moment, because it’s the best modeling, most flexible modeling technique I can find, and it’s got a bunch of problems, and I hope somebody comes up with something better. But I go back to your point about value streams. So if you’ve got a team, get them to map out their process and how they’re working, the team touches data, highly valuable. And then the next thing you want to do is, you want to map out your core business processes as an organization, and use the term who does what? And you’re going to see, if we take an E-com one, customer orders product, Or customer places order, customer pays for order stores ships product, or stores ships order. You’re going to see this hierarchy of business processes, and that’s your data model, that’s your shared language. Whether you physicalize it in dimensional, or vault, or activity, who cares? And as soon as you start asking who does what, and you map it out as a graph, as links and nodes on a screen, on a whiteboard, mirror, or whatever, you’re going to find the complexity. You’re going to go to that problem of, does a customer return the order, or do they return a product? Because they’re two different design decisions. And value stream, who does what, there’s your conceptual model, 80% of it? And your big areas of concern? They stand out. That’s kind of what you need.
Joe Reis: That’s all you need, man, really. If you were to ask me, my favorite data modeling technique, I would say, if you’re looking at UFC, what’s your favorite, what’s the best martial arts? What’s the best move that you could use in a fight? I think it completely depends on where you are and what’s happening to you. There’s no shortage of modeling techniques out there. There are TBD, like in streaming data and machine learning, that’s still, up in the air if there’s any canonical way of doing stuff. But I would say, you need to learn what works, and be able to apply it.
Shane Gibson: I love that. I’ve never thought about it, mixed martial arts. That’s a great way. I guess it should be the thing you would book. The mixed martial arts.
Joe Reis: It is. It’s actually a big theme of it. They nerd out on it and they used to do it, but it was like, the thing you quickly realize is, if the Kung Fu master saying, Kung Fu is the best martial art, that guy’s going to get his ass warped pretty hard very quickly in spectacular fashion. And that’s how I feel like a lot of people approach data modeling these days, where it’s like the one true technique and I’m like, I don’t think that there is. And if you try and do that, especially in a cross-sectional way, across the data lifecycle, it’s the same thing. Like again, going into a fight only being a boxer, like you’re going to get taken down and, fuck manhandled. So that’s what happened.
Shane Gibson: I’m with you. There’s a bunch of patterns and you get good at them. And you always go towards the one that you know the most. The one you’re the most comfortable with. That’s when you’re in fight or flight. That’s what you refer back to. So rest is always take you to the map. Boxwe always hit you in the face. That’s how it is. When they’re in trouble, but you still need that toolkit or you’re going to get your ass bumped. So there we go. That’s what we need. Mix martial arts of data modeling.
Joe Reis: Well, awesome man. For, people who want to learn more about you, how can they do that?
Shane Gibson: LinkedIn’s probably the best way to do it. I had one of their shagility, a little bit on Twitter, but I tend to be rant ready. So it’s Shane and agility, but it’s also a little bit of Power BI.
Joe Reis: International data neurodivisory. That’s funny. And you got a podcast too, right?
Shane Gibson: Yes, I do too. So I do the AgileData podcast. And so what that’s about is where I’ve found somebody that has a set of patents that they can actually describe without pictures. I get them on the podcast. And that’s actually really hard. It’s really hard to get somebody who goes, here’s a pattern that you can implement and here’s the context of when you should and shouldn’t implement it. So that’s that one. And then I co-host one called the no-nonsense agile podcast where we have agile and product people come on. And for me, it’s really interesting. It’s almost like free training. We get experts on who know their stuff. And then they spend an hour talking to us and I get to ask questions to understand, how continuous discovery works or those kinds of things. And then I’m like, cool, I learn something new. And how do I apply that to data? And how do we apply that to our startups? I just treat it as a free hours training.
Joe Reis: I always do the same thing with my podcast too. I feel like I learned a ton, even today, I feel like I learned a ton. I took a lot of notes on the side. And, it feels like podcasting is sort of this cheat code where you can talk to really smart people and learn a lot if you’re paying attention. And it’s awesome.
Shane Gibson: And because my co-host on that one’s Australian, he’s really, really grumpy. And so we’re going to call it the no bullshit podcast, but we got flagged. What Murray’s really good at is really just getting down to those examples. Because what we found was there were a lot of fluff merchants. And there isn’t data as well. Us talking to ourselves a bit. And getting back to that meat and potatoes. Which, you are writing a book, that’s what you got to get to. You can’t write a book of fluff. You’ve got to put some meat and potatoes in there.
Joe Reis: So you’d like Salt Lake City too. I mean, not too far from here is where the Agile Manifesto was penned actually, up at Snowbird. What was it, 22 years ago now or something like that?
Shane Gibson: I’ve got to go to Austin next, apparently, because all roads lead to Austin. But, definitely, we’ll be over in the US probably sometime next year. So we’ll do a bit of a tour around and maybe come and visit.
Joe Reis: Yes, please do. You got a place. So awesome. Well, great chatting with you. And I guess I’ll see you back on Slack. So, fun conversation. I learned a lot on this one. Talk to you soon, man. So take care.
Shane Gibson: Excellent. Thanks for having me.
AgileData reduces the complexity of managing data in a simply magical way.
We do this by combining a SaaS platform and proven agile data ways of working.
We love to share both of these, the AgileData Product cost a little coin, but information on our AgileData WoW is free. After all sharing is caring.
Keep making data simply magical