Analyst vs Analytics Engineer
Join Shane and guest Benn Stancil as they discuss the difference between the age old analyst role and the new emerging role of analytics engineer (amongst a few other interesting things)
Read along you will
PODCAST INTRO: Welcome to the “AgileData” podcast where we talk about the merging of Agile and data ways of working in a simply magical way.
Shane Gibson: Welcome to the AgileData podcast. I’m Shane Gibson.
Benn Stancil: And I’m Benn Stancil.
Shane Gibson: Hey Benn, thank you for coming on the show. It’s great to have you. I think today we’re going to talk about the world of analysts and analytics engineers. How we can help them do good things with data in an Agile way. But before we do that, how about introducing yourself for the people out there that listening who haven’t actually heard of you?
Benn Stancil: For sure, and thanks for having me. So like I said, I’m Benn, I am one of the founders of a company called “Mode”. We build products for analysts and analytics engineers. So it’s a BI product basically built for analysts. So I’ve been in the space now for about 10 years originally, as an analyst myself, and then as a founder of Mode kind of been through a number of different roles as founders are kind of apt to do. Bouncing around kind of between roles in marketing and product in our own internal data work. And now I spend most of my time in either the internal work or in basically lead the community to try and understand the directions that people think the industry has gone, how exactly that fits into what Mode is doing, where it is that we can provide the most help in the future where it is the markets evolving that we don’t want to be a part of things like that, and then having conversations like this to try to better understand that.
Shane Gibson: So if we look at the market at the moment, we’ve got that new specialization that’s just come out the idea of an analytics engineer. And as you said, you’ve got a product that’s been serving analysts for a long time from what I can tell. So well, before these, this hyper specialization came out, what’s your view on the difference between an analyst and an analytics engineer in terms of the way the market’s describing it right now?
Benn Stancil: Yeah, so I think there’s the commonly agreed upon difference which is basically, we don’t say, analytics engineers are responsible for writing what amount to data transformations, that they are the ones who aren’t necessarily getting data into a database. But once it’s in a database, applying logic to it to make something useful out of it, that they get the primordial soup of data, and then have to turn it into something that can in theory be begun to turn to the life. An analysts that are the people who are responsible for basically making sense of that, and how do we actually interpret it, how we apply it to the business? What decisions do we make with it? How do we make recommendations other folks? I think that in theory, that line makes sense. And you have sort of the people who are the data engineers getting data into the system, you have analytics engineers, who are maintaining the data in a nice clean way in the system, then you have analysts who are trying to draw the business conclusions from that data. In practice, the line gets pretty fuzzy around things like metrics definitions, and those kinds of stuff. Where it’s like is that an analyst engineer, isn’t an analyst if it isn’t analysts, or if it is an analytics engineer, then what does an analyst do? Because are they just sort of handing those metrics off to a business stakeholder? So I think that’s where it gets kind of fuzzy. I don’t actually know what other people’s opinions are on this. Most people I think, don’t fully say analysts or things that should go away. But it is sort of notable that there’s a lot of squeeze there. And I think that’s, there’s lots of potential to talk about in that. But I think that’s a dangerous direction for the industry kind of is to overemphasize the value of analytics engineers at the expense of what it is that analysts bring to the table. Now granted, I come from what you would probably call it analysts background. So maybe I’m biased here. But I think like, that’s the sort of boundaries that we’re kind of drawing. And I think there’ll be a little bit of a pullback from what it is analytics engineers do, as we start to realize, hey, we actually need a lot of these skill sets. And it was time that we’re kind of slowly eating up from the analytics engineer. Kind of bottom, if you will.
Shane Gibson: Yeah. I feel a bit sorry for the data engineers, but we can go back to that in a minute. So when I’m coaching teams, because I spend a bit of my time when I’m lucky enough to coach data and analytics teams, one of the things we try to encourage as T skills, that people have specialization typically and some skills so they might be strong transformation coders or they might be strong testers or strong facilitation or I’m getting at requirements, but they tend to have other skills, right? So person, it’s more of a business analyst skills will tend to be quite good at writing documentation. So we try and get T school teams where when the person with the best specialization is busy doing a task, another team member can pick up another task that similar and just get it done, they won’t get it done as fast, but they can get it done, so we don’t block. And the idea there is that actually, we now have a team that can pick up the work to be done and get it done. And what I’ve seen the market do is move away from this move to hyper specialization, it’s moved to hand offs. So you described, a data engineer goes and grabs data from somewhere in lens, , an analytics engineer, you write sequel to create a model, an analyst thing comes in and tries to pick up that model and actually use it to answer the business questions. So we have all this handoff. And for me, that’s actually an anti-pattern for Agile ways of working that really hyper specialization, that factorization of the process. And that handoff process is actually where a lot of the problems happen. And from my point of view, this whole hyper specialization has been driven by the vendor market trying to come out with a niche with a category. And then to own that. And so by coining a term and calling a role, you now have a persona that you target and you can focus on this specialization. So what’s your view on there? Are you seeing that hyper specialization that teams now handing off rather than a group of people working together and doing the skills, the things that they have skills to do?
Benn Stancil: We don’t see that that much. But I think that dynamic is there in a way that is tricky. So I’d make a couple points. One is, I very much agree with you that vendors drive a lot of this, because they’re looking for their niche. And the data market now is a very, very crowded market, there are tons and tons of products in this space. And there still is a little bit of like a dance being done by all those vendors and Mode is one of these vendors. So this is not a thing that I am not guilty of. A dance done by those vendors to figure out like the places that they fit where they are, if not the category creator, sort of the clear best thing in this particular use case, there are way too many companies for their own categories. Like it would be ridiculous for these companies to also like iron my own specialized category of sequel driven, AV testing done on top of snowflake for Ecommerce companies. Don’t, somebody’s probably track. But as a result of that, I think they do look for like their places where they can kind of build fences around what they do. Part of that comes in defining roles defining these different things, you see a lot kind of the thought leadership type of content that’s all angled at this sort of direction. Analytics engineering, I think is something that has transcended that to some degree. I think, the folks that DBT probably did the most to promote that particular role. I’m sure that as an adult was obviously clearly motivated by what it is that DBT sells. However, the uptake in the industry, I think has been such that was less of like them inventing something that wasn’t there and then more just like, capitalizing on a direction that actually made sense for folks. And they I’m sure push it, but it’s a reasonable push probably to make and some of that’s real. A lot of that’s real in their case. That said, I think that there are on the margins, it creates some weird tension where in the analytics engineering thing, there’s no people who identify this way, and want to protect this space. And so you have these weird edges, where it’s like, who’s responsible for what? And I don’t think it’s so much a resistance to collaboration, but it’s more of a little bit of being territorial and saying, “No, this is the thing that I want my job to be, because I need to make sure I carve out enough of a role for what I do”. And people want to have influence in the business and things like that, people want to not have some tiny, narrow little niche to work on. They want to solve bigger problems. And I’m not sure that like the analytics engineering role is quite yet defined well enough to prevent the people who are in it from annexing a little bit more than is probably right. We used to have this dynamic of data engineers and analytics engine or data analysts, data engineers and analysts or data scientists, we want to call it sort of consumption people. Data engineers do a bunch of stuff, the place that collided as pipelines, there’s that whole Stitch Fix. I think it was Stitch Fix the blade or data scientists shouldn’t write pipelines or data engineer shouldn’t have pipelines. So you have this sort of friction point there. But for the most part, people stay in their other lanes. We now introduce analytics engineers and that middle point, crowds out sort of data engineers across some analysts, data pipeline writer is not a particularly compelling job. Like, nobody wants the job title of your job is to be like the manager of data pipelines. And so I think analytics engineers start to push outward to the point where they are pushing into what it is that analysts do. And my concern is basically find ourselves in a position where that role becomes big enough that the analyst role starts to deteriorate. And we start to hire for people who are good at maintaining pipelines and modeling data and sort of injecting business logic into messy data, as opposed to people who aren’t like the primary skill of solving a problem. And so I don’t know that I care that much about how this shakes out. So long as we still hire a bunch of people who their primary skill set is I think critically about business problems. I think about applying data to them. I don’t care if I’m good at technical stuff. I’m good at looking at a problem being like, Okay, how do I use data to help answer this question? And while analytics engineers can do that, it’s kind of the secondary skill. And so to me, the thing that like goes wrong here, potential isn’t how it’s gone. But it could go wrong, is if we say, okay, we’re replacing data engineers and data scientists with data engineers and analytics engineers. And nobody has the primary skill of making sense of business problems with data, we put all that over to like business stakeholders, who understand their domain, but aren’t really data experts. And now you have this gap of how do we actually interpret this data to make a business decision in a good sound way.
Shane Gibson: Yeah. So let’s break it down to looking at it from two personas. So I’m going to start off with data engineer first, and then I’ll go off into the analyst. And we’ll kind of look at it from those two points of view. So I’m old enough to remember ETL developer, back in the days where we had tools that had to do the transformer memory couldn’t do in the database. So it was extract, transform and load versus extract, load and transform, that effectively an ETL developer was responsible for collecting the data, transforming and combining and conforming, and then making it available for somebody to write a report, an analyst or somebody else. And then I saw that, that roll kind of get rebranded, what’s the data scientist turned up and got rebranded as a data engineer, came out to some of the cool startup companies. But they were still solving that problem was, data had to be collected, data had to be made for purpose, data had to be given to the people that wanted to use it for analysis and then we saw the analytics engineer come in and say, “Okay, I’m going to take that transformation run”. And I agree, the ability to do that transformation writing SQL has shown us that actually, we have a lot of people who look for it in SQL. Now, who can do that right? So we’ve democratized or made it more people can do that work. And I think that’s a great piece of value that we’ve gotten the data domain. But the power of data engineer now what they live with? They live with, grab the data from Salesforce and suck it in and landed into an s3 bucket before the analytics engineer gets to do the cool stuff. And actually, when I look at people in those roles, what I’m seeing them start to do is going well, that what you see it, that’s boring pipeline work. I don’t want to be stuck doing that. So we typically look over the fence at our software brethren, and they going, Hey, DevOps, that’s cool. Well, let’s become DevOps engineers, we’ve got more problems to solve, because technical people like to solve problems. And so as a DevOps engineer, now I’ve got lots of technologies, I’ve got to get a metric store, and I’ve got to get a lake, and I’ve got to get catalog, and I’ve got to integrate those things. Because nobody does it for me, that’s a cool technical problem to solve. So again, moving from this hyper specialization of move data from left to right, because it’s boring, and going out to use your skills to solve lots of problems. So that’s what I’m saying, but what are you seeing in the data engineering space?
Benn Stancil: I think that makes sense. I think there’s two effects or two things that are different in the data engineering world. One is the analytics engineering piece of you’re not writing transformations, you’re not as a data engineer, you are mostly divorced nail from the business in that. It used to be you had to understand, what’s the point of this data that we’re pulling in great, we’re not only getting some sort of streaming data from this thing, it’s like we need to understand what it’s gonna be used for, because we are needing to write it in a particular way so that it can be used for that use case. Now this certainly isn’t applied every case and stuff like that, but like you can probably be a successful data engineer without having a clue what your business is doing. Like you can get, I need to get data from here to here. How people deal with it is on the other side and I don’t touch that and sure you probably got a little better if what’s going on and certainly in practice, you actually would, I don’t think you really need to. You’re just writing raw data from one place to another. So I think that is one way in which it’s different and so that I agree with you, pushes things back down like looking for other problems. The second thing is basically the vendor ELT that the five trends and stitches and stuff in the world have made it. So a lot of those pipelines and even things what snowflake can do and things like that. So AWS products have made it. So those pipelines are also not things that you need a data engineer to do. You need a specialized problem to hire a data engineer. At a startup of 20, people likely doesn’t need that, because all of the source is gonna be coming from either warehouses or from SaaS apps that you can go buy it to off the shelf, and that’ll be the one that, can write the thing in your warehouse for you. And maintaining a database is now a thing, you just pay snowflake to do pay big food or whatever, all of those kind of operational tasks have been sort of abstracted away by vendors too. So it doesn’t leave a data engineer, like what do you do? The things that we typically see our one of two directions. One is this, data ops concept, which is kind of DevOps type of stuff, it’s like figuring out how to keep the systems up, maintain sort of the legibility across a bunch of different tools, basically make the thing work in a way that isn’t going to be designed for, like you have five training DBT, and you’ve got snowflake, and you’ve got Mode, and you’ve got census, and you got high touch, or whatever, and all these things, like those things that actually communicate with each other in some way, this is a problem in general with new modern data tools is you have a bunch of disparate tools that can’t talk to each other. And you’re like, “Okay, how do you actually create an experience out of that, what is the experience of using this platform?” And data engineers, I think are kind of well suited to think about that and the opposite of it. The other problem is, it seems like they are becoming more specialized in that. There are some companies that have true data engineering problems. And that’s where you have to go to do this stuff. Your average 30 person company does not need a data engineer, or even like 100 Person Company, there’s not really needed data engineer. Netflix, sure does. The Uber definitely does, the companies that are doing things at real scale, doing things where like timelines, like latency really matters, doing things where they’re trying to solve problems in unique ways that aren’t just connect Salesforce to a database and write queries on it. Do really need those folks? And so I think there’s probably some upmarket draw for data engineers, where it’s like, okay, I need to go work at one of these companies that has true data problems, rather than a company that’s just trying to manage a bunch of stuff together. They can use like SaaS vendors to do.
Shane Gibson: Yeah, I agree. I think the data engineer comes into their own when there’s a scaling problem. So I had Sean Maga on the podcast a little while ago, I’d worked with Sean, many years ago in New Zealand. And he’s now at data ICO. And he said, one of the things he said was, he went to a conference, which was a Data Science Conference, and there was a presentation by somebody from Uber, Netflix, one of Airbnb, one of those. And he asked him the question of, as a data scientist, how do you get access to your data? And the person looked at him strangely, as if it’s the weirdest question ever. And they said, “Well, I just select the data engineering team” and 15 minutes later, they send me a link, and I use it. And so a lot of us aren’t used to working at organizations of that scale, where actually, you can have a team of engineers, it does make the problem go away. I think the other one is you’ve identified is this idea of companies that use SaaS products. They’re using slack. They’re using Salesforce. They’re using HubSpot. And there are now solutions to democratize getting access to that data, because it’s a repeatable pattern. But the large companies, they are building their own platforms. They’re not using off the shelf software as a service. And so therefore, again, we need a data engineer to solve the technical problem of how do you make that data available. So if we take the lens, and we flip it to the analyst. So in my experience, again, I’m old enough to remember when the analysts were responsible for working with the stakeholders to understand the business problem. So what is our problem? They were responsible and skilled to explore the data to see how the data might give us an answer of where the problem lies, and potentially what we could do to solve the problem. They would typically create some way of visualizing or presenting a view of the problem using data and potentially a view of that problem have been solved. And heaven forbid the really good ones would actually be responsible for helping roll out the change in the organization to make the problem go away to close that loop and actually see the benefit. Right now what I think I’m seeing as other, again, this hyper specialization means that these analysts are getting relegated dashboards designers. What are you seeing in that market?
Benn Stancil: There is some of that, though, I actually put a little bit of that. I think that is, in some ways the responsibility of the analyst to not do that. I think most data teams would say yes, they don’t want to be dashboard designers, their job is to help think that business decisions, they’re there to support the strategic initiatives of the company, all that sort of stuff. A lot more people say that, I think that are successful employment off, it’s hard to do. And it’s hard to do, because dashboards are now relatively easy to create and everybody wants it. And there’s a sort of increase in demand. Because you can do it more than that. All these people have access to a bunch of data, say you’re a support team. They it used to be it’s like, okay, you use ZenDesk, look at the ZenDesk things like that’s kind of the data that you get, and the analysts are focused primarily on the big company problems and things like that. Now, it’s like everybody has access to data, everybody knows they have access to, support team wants to access to, they want the same things everybody else wants, and they shouldn’t be able to have them. And that creates like a lot more demand for we need to build dashboards and all kinds of stuff. I think as a data team, it is difficult to get away from that. It’s like easy to think, well, we need to build sort of the foundation, and then we’ll finally get to these bigger questions. And it’s a mirage that you never reach. Because the more dashboards you build, the more people questions, the more things they need to poke at the more like, what about this? What about this, that comes up? And so I think at some point, it is on the analysts themselves, to say like, this is how far we will go no further beyond that, we’re going to answer these more important questions. Yes, you don’t have a dashboard that is exactly this problem. But you don’t need it. What’s more valuable for my time is to work on these questions. So it’s a discipline there of saying, we will only build so many. A company with 1000 metrics is a company with no metrics basically. But I don’t know how much of this dynamic like the sort of new data engineering ETL analytics engineering world affects this, other than it just creates so much additional data basically, that while building an individual dashboard is in theory easier, you have like this giant proliferation of things that you could do, you can sort of busy yourself forever with that work more so than you could have in the past. Maybe there’s something else to it, but it feels to me sort of like this is a job of the analysts to recognize where their value is, and to draw the lines in the sand. And so that’s what I’m going to do, rather than the business’s needs changing or something like that. I think it’s just the job of the analyst is to say the same things, the tools have evolved. Fundamentally, their job is no different and people should just focus on.
Shane Gibson: Yeah, I think when I’m working with a new team, one of the things I get them focus on is what I call how to make toast, based on a really great video from TEDx. And it’s this idea of what is your process? Who does which bit of work, and when do they do it? And so we see a bunch of patterns normally, that a team will come up with. So there’s the prototype production patent, I call it, where they’ll go, “Okay, we’ve got a problem, the analyst will go in and start looking at the data that’s available,” they’ll start doing some exploration, they will do some light prototyping typically in a visual tool, because seeing is believing, though, iterate that with the business users or the stakeholders to see if they’re close. And then when they’re happy that they’re on the right path, they’ll push it back to the engineering team to kind of rebuild a following all the standards and the processes, we then can take that model, and sometimes break it down a little bit differently. So what I call big code or big design up front, so boil the ocean. And so we kind of understand the problem and the domain, and then we get the engineers just to build a whole lot of robust stuff without actually proving that data is going to solve our business problem. So we get lots of time and lots of delays, they do all that work. And then the analysts get it back. And they’re like, well, that didn’t actually do what we needed to answer this business question. And we go through iterations and loops on that. There’s kind of a third model I see which is chaos reigns, which coming from a traditional background where I was taught, and I still believe that data should be modeled. But this this model came out of the Big Data wave. And it’s this idea that actually, you can write pieces of code that go from the collection of the data through to the in information product or dashboard or Reverse ETL into Salesforce, and that’s bound as an object and then you do it again and you do it again and they don’t touch and you’re not getting a lot of reuse. But what you’re doing is you’re making them or isolated and disposable. And so therefore your focus is on the speed you can do that process. And the fact that you will never make anything reusable, but that’s your way of working, you’ve made a conscious decision for them. What do you see in the market at the moment in terms of the way the tools are encouraging that those ways of making toasters, you’re a general trend coming out? How do you see it lots of ad-hoc’s, lots of big modeling and coding up front or prototype to production or something else?
Benn Stancil: So this gets a little back to the cost of production has gone down so much that we probably have fallen too far on one side of the spectrum. So the way that we I would say it is like, there used to be a world where building a dashboard was this giant thing, and kind of the initial thing, you go through the modeling process, you file tickets with IT, a month later somebody comes back with some Spotfire dashboard, that isn’t exactly what you want, and found a ticket and two months later, you finally have a thing. And by this point, the project is already over, and you don’t care about it anymore. And knowing you have to go through that process, everybody is cautious about what they do. The people who are building the thing really want to make sure that they’re building something that’s good, that people are asking for, don’t ask for it, unless they know they’re gonna need it for a while because they will have it for a while. There is no turn this dashboard around for me tomorrow, because that’s just the house to happen. I can’t go to Slack and be building this feature tomorrow. I know, it takes a lot of time. And so dashboards are simple. We have really lowered the cost of building them. There are some tools that try to favor that. There’s tools like Looker, which are sort of traditionally shaped and that is still very much model upfront, get dashboard out the other side. There are tools like Mode that are more built on the notion of we need to do ad-hoc analysis to answer questions that make button dashboards very cheap and easy. And all of those things, though, I think the dynamic that comes out of it is you produce a lot of stuff, data in general has a lot of exhaust, there’s a lot of things that happen that is kind of either one off stuff that doesn’t need to be revisited. Or, it’s something you need it for a minute or something where we’re measuring this particular metric, and it changes pretty quickly because the business is changing. The chief production of this stuff just like flooded the world with junk. And so think that, it’s less a dynamic. To be it is the third case that you’re describing but unintentionally, it’s that we now can do all of this. So we just make a bunch of things that are kind of throw away. And don’t necessarily think about what we do with them out the other side. In some ways that that is okay. Because those throwaway things are oftentimes the most important questions you want to ask like, you don’t need another judge, but what you actually need is an answer. You don’t need to go produce my dashboard. But I do think it’s important for the next step of the industry, in some ways is thinking about how do we actually make sense of all of these things that we have produced in a way that were not meant to be permanent? Like I have this, I wrote out a while ago about like, even just a term ad-hoc, where I think a lot of these are called ad-hoc analysis. It’s kind of whatever that that’s like become the phrase for a video of dashboards, reports, and you’ve got ad-hoc analysis. The notion of ad-hoc sounds very, it sounds very ephemeral. It sounds like a thing isn’t important. It sounds like, I will send you an ad-hoc email. That is a thing I hammer out on my phone on the subway. That’s not our engineer, we shouldn’t have these things just like throw away, we should think of them. Sometimes they are. But sometimes they’re very important. But things we need to persist are things that matter. They were just meant for a particular question that we’re not going to ask 100 times a year. And so it’s always also those are the most important questions you ask what to do to answer once are often the most important ones. But you’ve got to ask it 100 times, it’s probably not that important that you get it right every time. So it’s like figuring that out a little bit. It’s got a little bit of the way that that we actually what is ad hoc, meaning how do we actually take the big questions that we’re answering and turn them into something that is more useful and more permanent, not in the sense of a main dashboard, but I’m making them like a piece of knowledge that we can continue to perpetuate forward and so.
Shane Gibson: When we’re starting with a data analytics team, and we’re trying to beginning their Agile journey, I talked about ad-hoc, and my definition of ad-hoc is making Shut up. We don’t do that, even though we’ve been Agile, and we’ve been under and if we still need some processes and practices, we need ways of working, we need to be professionals and what we do. So I can see what you’re saying. And sometimes ad-hoc just means rough and dodgy. Sometimes it’s more around. Let’s do something alternative and quickly and see if it has value before we then invest more on it, but we’ve all done the one where we get an ad hoc question, and then we’d do a quick dirty piece of work. And then the next thing is, actually can you run that every week? Because you’re now giving me an answer to a question that I need. So interesting thing you said there about metrics and junk. We’ve all experienced data lake, data swamp problem. Do you think we’re about the idea of model or metric junkyards with the tools, we’re giving analytics engineers to write lots of stuff quickly, but not have a lot of rigor around the model and the reuse?
Benn Stancil: I think the analytics engineer’s piece of this is starting to emerge you. It’s harder to do it than it is on just like the question answering thing. Because it’s a little bit there’s usually more process in place to produce it. So say you’re giving the ad-hoc stuff that you’re describing, someone comes in and says, Hey, I have a presentation in 15 minutes. And I need to know how many people, how many customers do we have in Latin America, and you’re like, I don’t really care what the number that is slide, they’re gonna round it to the nearest 100. And so you could say 400, you could say 700, and they don’t, it’s all the same. It’s gonna be a slide that’s up for 15 seconds. That’s easy to create, easy to do something that happens really fast, it’s easy to throw away, nobody checks it like now you have this kind of lingering thing that somebody may find in here and be like, Oh, that’s how many customers, we have new parts of the world. I was only counting Latin America doesn’t actually say anything about other places. But the dashboard looks like it does. They’re from the truth and like, Oh my God, nobody knows what’s going on. That doesn’t happen as much in the analytics engineering world, because if you’re building a new data model, or whatever, it usually goes through some sort of peer review processes, version control, that kind of thing. However, that doesn’t prevent us from adding a lot of cruft, version control does not prevent tech debt. If you look at any product that’s been around for a while, and you will see even the most rigorously controlled and developed systems, with hundreds of people thinking about how to architect them well, will still produce a lot of tech debt. And most Looker models, or DBT models or metric stores are not produced that rigorously, they’re often thrown together initially in a fairly ad-hoc way. And then kind of layered on and layered on and layered on and become this a giant stack of sedimentary rock that just looks different from each layer. And you can distinguish the point at which one person left the company and someone else joined because all starts to change their color. And so I do think there’s a problem there. It’s a problem that is still early, because all of this stuff is still evolving, we’re still figuring out but analysts engineers even do. And so we don’t have easy ways to govern that. But it feels like an increasing problem, like there is probably a market. Three years ago, there was a market for you to be a data consultant to come in and tell people how to like implement DBT and Looker. Now, today, there’s probably a market for you to be a data consultant to come in and tell people how to clean up their DBT, and they’re looking at models. I think we’re probably getting to that stage of grief or whatever. And it doesn’t mean those tools are bad or doesn’t mean it’s done wrong. It’s just like, that’s the next problem to figure out.
Shane Gibson: I think maybe over here in New Zealand, we were a little bit late to the rest of the world. But yeah, in terms of our startup, we’re bootstrapping. So both my co-founder and I still side hustle doing consulting gigs for large corporates. And right now, the modern data platform has made a massive market for going into large corporates, and helping them choose a tool from each of the category and how to integrate it. So each one of those takes two to four weeks, because they have a governance process on how they can select something can go through the architectural forums and all that kind of stuff. So there’s a whole market now for consultants to go in for three months, just to pick DBT, snowflake and get it approved and cobbled together, so interesting. If we look at that hyper specialization, though, and we’re starting to see, and I’ve seen you talk about it, and the awesome articles you write, there’s a whole matrix layer, what we used to call a semantic layer coming out as a subcategory in the market. So I’m again old enough to remember business objects, universes, and I actually liked Looker, a mill as a pattern for ways of defining semantic language that’s reusable. So we’re starting to see that category come out, are we gonna get hyper specialization where we’re going to have a matrix engineer or a semantic engineer that works between the analytics engineer and the analyst who actually wants to use the data for good?
Benn Stancil: I don’t think so. And this is potentially a little bit of a way to solve the analytics engineer analyst’s collision friction point thing where in the pre-metrics layer world, you’ve got people writing, doing anywhere else transformations. So you’ve got them running like stuff a DBT, you’ve got a writing look, ML. Look, ML combines these two. But putting that aside for a sec, in that world, the metrics become very much like the collision, because I’ve got a table that’s dimension customers, I need to figure out what was our monthly ARR from these customers? There is some computation that has to go into that that’s not straightforward. It’s like, do I include these customers or exclude these, what do I do with customers returned? When do I actually credit the ARR? Is it when the customer signs, is it when the customer actually starts paying? How do we deal multi-year deals, all that sort of stuff. So especially logic that isn’t necessarily implied in a table that says, customer, contract, start date contract end date, there’s a bunch of things there that you have to figure out. And the metrics layer basically moves that conversation into what is to me the analytics engineering world that makes metrics a governed thing in the same way that like data models are governed. And this is traditional BI, this is what BI has done for decades. And this is what Looker did is granted, they did at all, all of the modeling and the metrics modeling was done and will come out. If it’s done now in DBT, and a metrics layer tool like Okay, so you have DBT for the kind of join logic and then you’ve got a metrics layer for how do I sum up dimension customers and ARR? But all that exists kind of before you actually get to the analysis. And I think that is a useful thing for, analysts engineers are kind of defining the logical governance of your data, the analysts then pick it up and do stuff with it. And I think, there are some companies that would look at that, and we wouldn’t analysts, we have marketing managers who can look at this and just like look at conversion rates, and they will figure this stuff out. And I can see why you might think that but I think it’s not that hard to sell in that world. The analyst is the person who needs to be able to say, I am making sense of these metrics, analysis is not looking at a bunch of dashboards and then saying, this is up, this is down, therefore we do this, it’s much harder than that. And so the metrics are just yet another sort of entry point for them to ask questions. But oftentimes, they will either have to combine them in weird ways that you don’t actually want to put in a metric store, because you won’t actually have that metric canonized in some way, or they’ll have a metric that you can pull apart, where it’s like, oh, this is how we compute ARR. However, the question we have today is about ARR with companies that are above a certain amount that are in overages that do this or that. We need to actually compute ARR differently for those folks, because that’s what’s called for this analysis doesn’t mean we’re redefining a metric, or it doesn’t mean that we’re introducing something new. It just means that the particular analysis doesn’t the ARR, as is written in the canonical way is not appropriate for the question with answer. So in those cases, the analyst probably stepped back into what is technically upstream of an analytics engineer, but who cares. They should be able to access the wall data to like the analytics engineers, or of prepping the bill for them. But that doesn’t mean that they sometimes don’t have to go to the grocery store themselves.
Shane Gibson: So again, for me, it comes back to that idea of T skills, if the analytics engineer can define the metric and a place every else can use it ARR, and we know it’s a strong metric, then let them do it. But if they’re busy, and the analysts can define it using the same capability, then live into it. If the analysts are going to define 15 versions of ARR, all I ever say is make sure there’s a word in front of it. So high value ARR, or gold customer ARR, just make sure that using a different name that says it’s not the core definition, that’s a variation of that. But again, it can be the analytics engineer, it can be the analyst, they both should be able to do that. So the right person could do it at the right time.
Benn Stancil: You’re not going to find all. I think that’s the other kind of key thing. And analytic engineers are not going to define every table, they’re not going to find every model, they’re not going to define every metric, not in the sense that they’re not like there will be some that the analyst has defined. It’s that no matter how many metrics you govern, there will always be questions that call for some number that isn’t one of those metrics. And so you’re always gonna be some overlap there, but it’s more of the analytics engineers to be defined. The canonical ones, the ones that show up on dashboards on an office TV screen, and the analysts are looking at those but they’re also thinking for this particular question, what is the adjectives that goes before these metrics that I actually use to answer the question that I have in front of me right now?
Shane Gibson: For me, it’s also about once we start democratizing something, we need to bring in some form of federated governance. So as soon as we enable lots more people to do it, they will scratch [inaudible 00:39:59], which is what we’re aiming for, then what we need is a some way of saying, how do we govern that without a large number of stupid data steward committees that will block everything from happening for three months. So it’s that balance between empowering people to do the work and stopping, ending up with metrics junkyard that will tend to happen when we go through that cycle. So if we look at the future, there was years ago, I used to talk about Search BI and Mobile BI things that were coming eventually. And we still haven’t got for some of those, but one of the ones I see, it’s kind of came for a while, and it installed. But I think it will come back because this idea of natural language, asking a question and getting a response. So we’ve seen thoughts what Power BI has got, Google’s got the Q&A service, the ability to say, how many customers I got one of my top 10 customers in natural language and get a response. And from my point of view, that’s going to change the way the engineering or the analyst’s roles works. Because we’re not structuring the data anymore to be dumbs and facts to go onto a dashboard. Like they may be the model we use. But we’re not coming up with a bunch of requirements, what we’re doing is saying this data can be tagged or flagged as this concept or this descriptor. And then the system knows that when you say customer, it’s looking at this piece of data. And when you say product, and when you say order, it knows where the data is because you’ve classified it, and it knows how to put it to give it out to that question. So are you seeing that natural language capability coming out in the future? Or, do you think it’s just a dream? It’s something like Siri where everybody’s got it, but nobody uses it? Or, do you think shared is gonna come out in the market. And again, the way we work will change because the data structures we need will actually be slightly different, the way we work will been need to be slightly different to empower that capability.
Benn Stancil: I want pretty mixed feelings on this. So on one hand to me, natural language stuff is good enough to be able to do it. It does some impressive stuff, you can do some non-present, and Natural MPJIS have gotten pretty good. I’ll give him. Certainly, it seems like translating the question into a query is not that hard. It’s very hard. But it’s a problem that people have done a reasonably good job of solving. And certainly over the next 5 or 10 years, I expect people to do a way better job of solving it. And it seems like Okay, so that’s something we can do. The place where still have some skepticism on this though, is the process of asking like a business question is a strange one, because it is usually someone asking for a very specific thing in very vague terms where they don’t know what’s vague about it. Like, they will say, show me the number of customers in Asia. And because neither of those were, there’s only two words in there that are relevant, and neither of them actually make any sense. It’s like customers, what is customer mean? Do you count people who are nonprofits? Is it all time customers, past customer’s, current customers? Is it customers that have churned? What is customer? Customer can mean a whole bunch of things. Is it customers would only make an absurd amount of money? Is it subscription customers? Like all that kind of stuff who knows, there’s some implied understanding of what that means. That means different things to different people As an analyst, if you get a marketer asked to show me the number of customers in Asia, you’d like kind of know they’re coming at this from a marketing perspective, they probably care about in this way. And if you’re confused, you could just ask them. But if you get a CSM, who’s asking you, they probably care about the active customers, because we’re trying to figure out a staffing problem or as a marketer trying to figure out if we’re putting money into Asia to spend more money on ads. Plus when we have Asia, what is Asia? Asia can be anything, anybody like, there’s a huge range of things that could people may think it means one thing versus guys as analysts, you have this context, and I guess you could, AI could have that context. And in theory, if a person can do it, so NLP thing could do it. The other part of this I think is very scary is, you still have to put a human in the process, like the Tesla problem. Great, this car can drive most of the time, I still don’t really want the car to drive by itself. Because if it’s gonna mess up, somebody needs there to tell me that. And if I asked an AI, show me all the customers in Asia and it says, here’s a list, they’re 800. I have no way of really knowing if that did this the way that I think it did. Like what did it interpret my question of customers to me? And I guess I can ask it, but now it’s like starting to get of what does customer means? You could act on a bunch of things or show me some technical definition of products make a lot of sense. Like, it’s the sort of thing to me where there’s a bunch of complexity to that. It makes it very hard to manage, because there is a language that is appropriate for the person to actually ask the question, and there’s not really a way to validate if you’ve got the answer that you actually meant. So we are making decisions on that sort of stuff, he wanted to have more confidence in that, I don’t know that AI is will really get us there. Maybe closer, maybe there are things they can do, I don’t know. But it’s a thing that I don’t have any really good thoughts on because it feels messy and hard but these NLP AI’s are very good. So maybe somebody smarter than I am and figured it all out.
Shane Gibson: So for me, when you talk about language, I don’t think right now we have a language that is ready and available to enable this to happen. The other thing I find interesting is we don’t trust the machine. So you can give me the number. I don’t know how I’m going to trust it. But actually, we trust humans, even though we know under the covers, there’s 16 different products and five people involved to take the data from the source system and actually get that answer. And maybe we’ve got lineage, but actually the complexity of the work they’re doing with that code. We’re not sure it’s right. We were really bad at writing tests in the data world, we’re obscenely bad. We’re getting better. But compared to our software programs, we just get the data and we push it out there, and we say it’s right. So again, it’s interesting how we trust humans, but we don’t trust the machines and we’ll see if that changes. So while we’re on thinking about the future, if I had a crystal ball, and I said, what is the one thing I want to happen in the world of data? What I want to grab a piece of data. I want to drop it into a bucket. I want the machine to model it for me and tell me what it means. I want to ask it a question. And I want it to give me an answer that I trust, do you think we’re ever going to get to a world where AI and I’m using my fingers to quote the word AI will ever be able to do that? Or, is data always going to be just too complex and the language we need, and the way we want to ask the questions too varied to let the machine do that magic for us?
Benn Stancil: I think we’re a long way away from that, because the questions are often very complicated. And I don’t think it’s impossible to get to a point where it is you ask the machine for a metric. And the machine tells you that, sure, give me a data point. I think we can get to. Give me in real time open air quotes around something, give me insight. I think we’re a ways away from that, because those questions are very complicated. This is not a thing that, to my knowledge anyway, that like AIs are particularly good at in any real reasonable way. They are sort of not that well versed in what AI’s do. There is a way to me that this potentially works. If you think about the way, so take a CC scenario familiar with them. Nobody’s augmented analytics platforms. There’s a lot of tools out there that are sort of broadly to shape like this thought spot sort of market themselves as this, Power BI as pieces of this Google ads and stuff like this. They tend to all work in a way that like looks a little bit academic, where it is given a data set, and it given a data set of a bunch of size and shape of flower petals. And it’ll spit out all these tick sort of graphs of correlations, where you can see, Oh my God, look, the cars with four cylinders get better gas mileage in the cars with six that is not at all how business questions weren’t. Like, you don’t have a giant wide dataset of a bunch of dimensions where you’re looking for correlations between like 50 different dimensions and measures. It does in a few places. There are a few problems for which that is true, but overwhelmingly, like the business problems we have are things, should we open an office in Japan? What do you even do with that? What does a machine do? Probably what it does, and it spits out like, anything it can find that is related to Japan. It’s okay, that’s a starting point but it’s not really that helpful. It’s the sort of wide like he’s talked about as a T shaped stuff. Machine like, AI analysis to me is just like a very, very wide and not like an inch deep of here’s a bunch of things that are all kind of correlated, I don’t know, go find a thread to pull on. The thing that could be interesting to me is if we get to a point where they help you ask, they basically think like an analyst, where it’s like, Oh, I found this thing. Here are the three questions that you might want to ask next, I’ll go look at those and rather than just sort of spreading out really wide, it’s here then you kind of drill in and not like drill in and sort of a drill down BI sense, but drill into the sense of like now what I have learned on this thing, what are other things that I want to learn that helped me understand why that is? Or, I see we have an office in Japan? Well, we have a bunch of leads in Japan. Well, now my question is, do they ever convert? Why do they not convert? Those are drill down questions in the sense of, I want to drill into this dashboard and filter by this other thing. They’re a totally new question. But a question is sort of the genesis is found in the previous one. And if you’re enable that kind of flow, maybe you get to a point where you get sort of the science fictiony thing like, show me this? Oh, that’s interesting. Let me go over here. It showed me this, you can maybe get there. But it’s a very different model to be then take a CSV and find insight in the CSV that is usually turns out that men are on average, how taller than women? Okay, thank you for finding that correlation. That is much appreciated. What else do we do?
Shane Gibson: Yeah, I’m a great fan of the five why’s and also what I tend to find is the first question that stakeholder ask you is not actually the question they want answered. It’s just the first question they know to ask, how many customers have we got? 42. Where are they located? 10 in the US. What are they buying? So they have a series of questions they know they need to ans to get to the data, they need to make a decision. So I’m with you in terms of natural language capabilities that help them do that themselves without an analyst after being there, if they can trust it, it’s going to get us one step closer to democratization and self-service.
Benn Stancil: There’s another thing too, that they have a note on this somewhere of, maybe a blog post one day. Can AI say no? Like, this is another thing that I think is the thing that holds this back is if I can ask the thing, any question and it’s, I asked the AI to jump and it tells me how high? There are times when it needs to tell me like don’t do that. Like, don’t ask that question. That’s a bad question. Go this other way instead. And that’s one of the things that analysts can do is, they can help guide the process, not because they’re just like, oh, you asked for this, here’s your answer. Here’s your answer or maybe don’t ask that. Maybe there’s a better way of asking that. I understand you’re trying to get at, you need to see this instead, or you’re finding a bunch of spurious stuff. Don’t ask that question. And I don’t know that an AI has to be pretty smart to be able to not only be able to answer these questions, but also to understand the point at which the question you are asking, is structurally bad. It’s a question that will ask if you will get an answer in a way that you don’t want it to be, or it’ll be aware that you don’t think it shouldn’t be. I don’t know that is a thing that we are that close to doing, either, but maybe, AI’s do crazy statistics.
Shane Gibson: Well, I think if somebody was designing that, really instead of saying no, because nobody likes it, when the machine says no, it should just say 42. Anytime you answer 42, then that you need to ask again. So just time flies when you’re having fun. And so I kind of want to close it out with one more question for you. And again, I’m old enough to remember when we move from mainframes to client server, and that technology step spawned a massive change in the data and analytics world. And over the last couple of years we’ve seen that same acceleration and change based on cloud technologies, given all the new tools, given all the new capabilities, given all the new ways of working, has this change over the last couple of years made an analysts life easier, more fun? Or, has it made it worse for them? Has it relegated the role to more dressy work, and less value work? What’s your view analysts in a better space now, or we’ve done them a disservice?
Benn Stancil: Better space, but not universally. I mean, the tools are better. They’re just nicer to use, they’re a lot more powerful. You don’t have to sit there and watch a query spin for an hour nearly as often as you used to. There’s a lot of stuff with databases where this was not long ago, but 10 years ago, you’d run queries and the Cluster would go down, and it was just part of life for you. You’d run the thing overnight, because you needed to and you’d come back and something happened in the middle of night, and I gotta wait till tomorrow to actually get your answer, very frustrating. It’s much faster to get stuff. So in that same time, no, you got to do some analysis on stuff in Salesforce, what’s going to take you a week to get to that point, and you’re gonna work with an engineer to actually do all this like kind of prep work? To actually answer the question you spend a lot more time just dealing with the system than you do with like the stuff that is just “fun”. However, I don’t think that analysts have gotten to the point where they’re mostly doing the fun stuff. Part of this is potentially the demand for data. Basically, the explosion of the need for data creates an explosion for the need for dashboards. And now people are chasing what is a much higher demand, and so they’re not actually working on the fun stuff to the degree that I want to. So I think there is still stuff to be done and figured out there. The potential is certainly there for life to be better. But I don’t know that universally it is because we’ve just sort of shifted the types of problems that you work on rather than like building a week to build what taking a week to build one dashboard and most of the frustrations of that. It’s now we’ll have to build 10 in a week. I guess I’m more productive, but I don’t know that I’m producing more value. And I’m certainly not necessarily more fun. So there’s doesn’t mean it can’t be done, but it wasn’t sort of like it. We just solved all the headaches of how people work with data. Now magically, I’m interested in great time.
Shane Gibson: Well, we’ll close it out there. Thank you for coming on the show. On the show notes, I’ll put in a link to your articles that you do every week. So just like to thank you for writing those, they’re well written, well thought out, well-reasoned, based on facts and some opinion, but opinion grounded in facts. So really enjoy reading them as I think the rest of the world do. So thank you for taking the time of writing such great content and making it freely available. That’s a thing of beauty for when people do that. So thanks for coming on the show, and we’ll catch you later.
Benn Stancil: For sure, thanks for having me and appreciate that, this was a good time.
PODCAST OUTRO: Data magicians was another AgileData podcast. If you’d like to learn more on applying an Agile way of working to your data and analytics, head over to agiledata.io.