How can data teams use Generative AI with Shaun McGirr
In this episode of the Agile Data Podcast, hosts Shane Gibson and Shaun McGirr delve deep into the transformative capabilities of large language models (LLMs), such as ChatGPT-3, and their potential to revolutionise data teams.
Drawing on Shaun’s diverse experiences spanning data management, programming, machine learning, and his current role at Dataiku, the episode offers insights into how businesses can optimise their data strategies and enhance their communication between business and data teams. \
Shane and Shaun examine the impact of LLMs on the data industry, the opportunities and challenges they present, and their potential to bring about a paradigm shift akin to the iPhone’s impact on smartphones.
- Debunking Data ‘Magic’: Shaun notes the prevalent business expectation of ‘magic’ solutions from data teams and stresses the need for improved business-data communication.
- LLMs as a Business Game-Changer: The hosts scrutinise the real-world utility of LLMs, discussing their evolution from average-opinion replicators to game-changing tools that augment human capabilities in tasks involving context understanding and decision-making.
- AI and Proposal Writing: Shaun suggests the application of LLMs in analyzing successful funding requests to aid in writing more persuasive proposals. In parallel, Shane envisions the models reviewing proposals for missed elements.
- Improving Stakeholder Communication: The duo emphasizes the value of LLMs in understanding stakeholder needs, and cautions against using AI tools to avoid necessary conversations. They also envision using LLMs to access and summarize vast amounts of industry-specific information, thereby helping data professionals adapt to new subject areas quickly.
- Human-centric Data Discovery: They propose a human-centric approach to data discovery using natural language queries and LLMs, abstracting away from the complexity of entity-relationship diagrams, and offering a more intuitive interface to query existing data catalogues.
- Challenges in LLM Application: The hosts discuss potential challenges, including non-deterministic responses, inaccuracies, and skepticism about the added business value of conversation-based query systems.
- Revolutionising Data Interaction: The conversation wraps up with the hosts advocating for interfaces that promote a more probing and conversational interaction with data, similar to how data practitioners interact with data in a notebook. They argue this approach could generate valuable insights and effectively navigate larger inquiries.
Read along you will
Shane: Welcome to the Agile Data Podcast. I’m Shane Gibson.
Shaun: I’m Shaun McGirr.
Shane: Hey, Shaun. Thank you for coming on the show. And today we’ve got one that I’m quite excited about. What we’re gonna do is, we’re gonna discuss this idea of generative AI and lms, large language models and where they can be useful and where they can’t.
We’re gonna have a back and forward and chew the fat around. Is it all buzz washing or is it actually something that has some value in our lives? Eventually? Before we do that, why don’t you give the audience a bit of background about yourself and how you got into this world of data and analytics.
Shaun: Absolutely. So my start was about 20 years ago between high school and university. Yeah, I’m that old now that I get to say 22 years ago. And I had a little holiday job at a New Zealand government agency called Statistics New Zealand. A couple of weeks, , between high school and starting university.
And it was an administrative job, but a lot of the work was. Administrivia about data. And one of the tasks I was given was, here’s a list of numbers on a couple of pages printed out. Can you work out which numbers are on list A and not in list B? It was a list of mesh blocks or something.
It needed to be joined together, but I didn’t really know what that was at the time. So I dutifully worked down the list with a pen and paper and did the job. And then I realized if you email me those files, that will be faster. And then well, could you put more data in those files? And then I worked out, on that, the pre slope as many people do from.
Doing stuff manually in Excel to programming VBA to make your life easier. And then I guess from that point on, I’ve always been a bit lazy and wanted to automate anything that I could with data. So my background is not as any kind of programmer. I’ve learned programming in order to solve that problem.
And for a while, my data life and my study life were a bit separate, but they joined together when I went to the US to do a PhD in political science. And I went to the US precisely to join together my interest in politics and political science with the way that they train people there in social science, which is heavily quantitative.
So learned a bunch of econometrics, statistics game theory, but also how to. Ask a good question and improve the questions that you ask. And that stood me in good stead when I started to work with you about nine, 10 years ago in, in Wellington, I was still working on my PhD, but I was already a bit sick of it and wanted to see what these skills could do in the real world.
And that data science phrase was starting to take off even though no one really knew what it meant. And we had a couple of years working together in, in optimal and I did some data science projects, which were genuinely very interesting, but even more interesting was learning about the rest of the value chain.
Bi data warehousing, data modeling. Yeah, I learned a ton from optimal customers and some of the training courses that we built together. And then in 2016, moved to London to be the first data scientist in a large new data team in an automotive services company. And on the second day, realized, They don’t need data science yet.
Luckily, I had just come from a consultancy doing more than that. And so once the data was roughly in shape, then I had a small team building data products driven by different levels of sophistication of machine learning and data science including some pretty cool stuff. And then for the last two and a half years I’ve been working for dataiku, which is a platform company that sells a software platform that makes some parts of this work quite a lot easier.
And that’s what brought me to here. So I’ve been in and out of data in some way for. 20 years using it in every job, doing lots of different jobs along the value chain. Can’t say I’ve ever been a honest to goodness data engineer, but I’ve obviously built pipelines. Can’t say I’ve built amazing ai, but I’ve definitely predicted stuff and can’t say I’m any good at bi, but I’ve built a lot of dashboards, so I’m quite a generalist, I would say.
And that’s maybe an interesting theme to pick up today.
Shane: Yeah, definitely. And that’s part of the reason I wanted to get you on is because, when we worked together you had this unique ability to cut through that buzz washing to. Go to the heart of it to go, yeah, that’s all great, but what’s the action? Or Yeah, I think it was, I always gotta remember the theme of, yeah, we could do a neural net, but actually if we do a couple of group buys and maybe a regression, we’ll actually pretty much get the answer a lot faster.
And then we can do the business process change, which is actually where the value is not spending six months data modeling or analytically modeling in our little cupboards to see if we can
Shaun: And it’s still something that people are out there trying to do. So yesterday I was at a, at an event that we were sponsoring with 50 or a hundred chief data analytics officers in various UK companies. And that’s still an expectation from the business that they are struggling with.
So most people working in this sphere don’t wanna set up their teams that way anymore, but there is still an expectation from the business that we gave you money, we put you in a basement and you’ll come out with magic at the end of it. And even as data people have learned to communicate better with the business and push back and try to get the business involved, we do see that time and again with customers at Dataiku as well.
The businesses, data people are ready now to talk, talk business. Are business people ready to actually confront some of the hard questions that they’re gonna be asked? Much less much less
Shane: do we ever need them? Can’t we just get chat, g b t to give us the questions and just cut that business person out? And that was sars. So let’s into it. Let’s start off with we’ve seen a bunch of Trends in the data world over many years we saw the Hadoop trend where it had some value if you were a large fan company that had massive amount of logs and you needed some ways of querying them quickly
and at scale and then we saw all the consulting companies in the shiny suits and all the vendors, buzz, wash it into big data schema
Shaun: got a little.
Shane: Yep. We’ve seen some mini waves. We’ve seen the modern data stack where it’s had some value and then it just goes crazy.
We end up with 25 different products. You gotta integrate for a couple million dollars just to query a bit of data. We’ve seen data mesh, which, for me, I’m still on the fence. I’m starting to lean into buzz washing because I see the vendors start to do bad things, or the consulting companies do bad things with a good idea of decentralization.
So let’s look at large language models, chat mpt and. Maybe we might even mention Google Bard, but probably not. What’s your view? Is it buzz washing? Is it a fad or is it actually something that’s gonna have value over the next five to 10 years?
Shaun: Next five to 10 years. Definitely. Personally, like a year ago, I was super skeptical when at the time GPT three seemed to be able to. Give you the smartass opinion of a teenager summarized from the internet and unsurprisingly, something trained on the internet, what’s the average opinion on the internet?
Some kind of, smartass teenager. And then there was a whole lot of generative art stuff that I’m, that’s cool, but I just, fantasy art is not my game. And that’s what all the generative art seems to just tend towards it. So a year ago I was like, oh, this is this will never go anywhere.
Six months ago it started to get a little bit real because people started actually applying it to something useful. And then as soon as Chet g b t was announced and revealed, and you could start to see some of the unexpected things that it could do, that’s when I think some threshold has been passed.
So I think. In the same way you outlined the big data thing. The Hadoop distributed computing, handling lots of unstructured or semi-structured logs. There was a real usefulness in that a technical threshold was passed that opened up a new kind of business capability. And I think these models, or at least the biggest ones, have pushed through some barrier, right?
When I think back to things I would’ve wanted to do two, three years ago that were extremely hard and just don’t even go there, or things that you and I talked about, five to 10 years ago, wouldn’t it be cool if this, that, the other thing it’s now at least possible to see, that something has been pierced in the fabric of the universe just a little bit.
And of course everyone’s trying to put, shove everything through that small gap. But I do think something that has been changed and then what has been changed is, in the acronym l m large Language model. So what that means is machines can now understand some level of context and conversation at a level that I think is significantly better than what we did before.
Now if you then say, because of that, they can or should do everything else, then I think that’s when you get into buzz washing. But I absolutely believe that some threshold in the ability of, especially the interfaces, right? Conversational interface we’ve all dealt with chatbots for the last 10 years and this feels different.
Shane: I agree. I think we’ve just had an iPhone moment so if I think about the iPhone, we had a bunch of versions before. We had some trials, some things that didn’t go so well where the interface was crap, but was slow. There was no internet
there was a whole lot of barriers technically and in terms of the way we designed it. And then all of a sudden there was that magical moment where the iPhone came out and just went from strength to strength. And for me I’m surprised at how much I use chat G P t to augment the work I do on a daily basis now.
Shaun: What are you using it for?
Shane: Typically writing but we use a little bit for data, a little bit in the data space for some really interesting areas. But that’s mainly because we’re experimenting I need to go do this task. Can it help? But nine times outta 10, it’s for content it’s writing, marketing content drafting chapters in my book where, I use it as a conversation
as a co-author to say, I wanna talk about this and then go back and forward the idea of this podcast is we go into patterns we go into things that ideally are actionable. Somebody could walk away and go, I could try to implement that today, or go away and read something quickly to, to try it tomorrow.
I tend to use a framework at the moment where I talk about ways of working as one set of patterns. And that’s around, your team design your ceremonies, the way you’re gonna work together as a team. And then I talk about the data value stream.
So that’s from the beginning to the end of a stakeholder needs something. How do we go through that entire process of getting some data and adding value to it and giving it back to them to answer their questions so they can take action and achieve an outcome. And then within that value stream, there’s always delivery
that hardcore data space, which I think of as a data factory the moving parts, the technologies, the data models the analytical models, the feature factories, , those kind of things where we actually do the heavy grunting work. So let’s start, let’s not worry about the ways of working and all the ceremony stuff but let’s go straight into the value stream and for me, the first part of the value stream is getting funding
either getting permission to build a team to get funding to do a piece of work, get funding to buy some technology, get funding to bring a team of consultants in. How does chat GT and LMS help us with that problem?
Shaun: Yesterday I was talking with one of these chief data analytics officers about how his his 12 year old daughter was using chat G P T to help with writing poetry. And we made a joke, but it’s actually a perfect. Illustration of what you’re asking. What if you had access to all of the successful and unsuccessful requests for funding for everything in your company?
What if you had the database of all the board papers where funding was requested important to have the success? For ones and the unsuccessful ones what if you could now you probably shouldn’t paste those into the public interface of chat G P T, because then that data would be out there in the world.
But if you had access to the private interface or if you had the API access, which doesn’t store the prompts or if, In maybe one year’s time, you had an open source model that you could bring into your own estate which was pretty good at understanding language, but you could shove your own data in it somehow.
Wouldn’t it be very interesting to train a model that understands what makes board proposals more or less successful? And then ask it to write your whole proposal.
Shane: Yeah. And I’ll go one step further. I found it really interesting. I listened to a podcast and somebody talked about it and I was like, holy shit, I never thought about doing that. What you do is you give it your proposal and you say, what have I missed?
Shaun: As a co-author, as a, a rub rubber duck, programmers often talk about if I just explain what I’m trying to do to a rubber duck or a piece of wood, or literally anything or person else, that’s part of, the main value of pair programming right is not so much two people on two keyboards or whatever.
It’s that you have to explain what you’re doing and then you come through a lot. And so in the cases where you don’t have a permanent full-time interlocutor just sitting there chat, g p t or anything as a, as an augmentation. , what have I missed? Because you might not want it to write your entire proposal.
Or you might want it to generate five proposals and you take the interesting things, or you might actually have a fairly good idea of what you want to do in terms of a data strategy and funding for that. But you might be missing some nuance of other peers who have been more successful getting funding for their digital strategy or whatever else.
And if those lessons are there in the data or in the way that they’ve used language, then I think it’s very credible that large language models could augment that. Now, do organizations want that? How will hierarchies respond to that? Will we get a big loop of every proposal looking the same? Maybe, but I don’t think that’s anything to do with ai.
That’s how organizations are set up and decide funding decisions.
Shane: But is that a bad thing? If we look at VCs and, pitch decks there is a standard set of slides you have in a pitch deck when you go for funding,
Shaun: been achieved without ai. Exactly. Yeah.
Shane: And so why wouldn’t you have a standard template effectively where the people reviewing the funding know the format you’re gonna go through now?
They look at the content, not, what’s the heading? Have you done the right font? Is it the right
Yeah. What potatoes
Shaun: I think you, did you just make a, an argument for standardized PowerPoint templates, but if the important thing about standards is that they focus competition where it matters or where someone wants it to be, which is the content, the value, the roi, rather than who’s got the, sexiest story.
Cuz people are being given money in large enterprises and, to do crazy stuff all the time. That never pays off, right? And it’s really hard for data people, I think, to swallow their pride. I also think because data budgets tend to be 10 or a hundred times smaller than digital budgets, it’s that case of they’re so small that they’re so smaller than people have relatively large.
Teams in terms of ft that it’s almost too easy to dismiss them, right? So data people should probably use these tools to augment their storytelling about why they need particular kinds of funding as much as possible.
Shane: Or get the stakeholders to do it before the data team sees it. Couple of other things I thought about. Stakeholders could use it where they put the business case into chat g pt and into the safe version, as you said. And then , says explain it to me like I’m a 10 year old to remove all the three letter acronyms and the buzz washing that we put into all our proposals as enterprise data people.
And then the other one is, and I haven’t tried this, but what happens if you put it in there and you said tell me how I’m gonna measure success. Tell me a set of outcomes that I can go at the end of this funding round where we spent the money and see where we got any value. That’s pretty scary
there’s no way anybody at a senior level would wanna be held accountable for that funding. But actually it’d be really interesting to see what it used when it came back, and whether it could go give us some, north Star or some OKRs or some KPIs or any of those other beautiful things
Shaun: Yeah. And again, if it’s, if you think about it, these models is trained on some average wisdom of the internet. But in that particular case it’s the average wisdom of people who have bothered to blog about that, who’s likely to have blogged about it. Hackers and people doing misinformation.
Probably not. Probably the people who have taught chat G B T something about how to measure the success of enterprise initiatives around data are probably people who know what they’re talking about , there’s a lot of focus on, will it give an average opinion that’s dumb or dumbing down or biased or whatever.
But when you think about who has contributed to the data sources, Who are in particular areas of expertise, then maybe there’s a really powerful resource there. And actually, again, just yesterday I did a little round table on how not to waste the general AI moment as a data leader.
And someone said they’d been struggling to bring to life something from the Dharma, one of those book bible things about data governance. And they weren’t finding in the book rich enough detail to bring to life what they were doing. And so they asked Chatt, GP PT about it, and chatt GP PT found in a different version of the book or somewhere on that website, or someone had blogged about it somewhere in an official way and in a way that the source could be accredited.
Something that really helped , that data leader bring to life why they needed data stewards or data custodians . So that’s a pretty good news story I would say about helping people get stuff done.
Shane: I think definitely, one of the things I like about Google Bard versus chat g PT is it typically gives you a link to an authoritative source. So I was writing a chapter in the book a while ago back in the early days of G P T when I was still using 3.5. And I got stuck on this idea of data domains cuz I needed it around requirements and design and it, I got to that area where I was annoyed
I was like, how the hell do I describe a data domain , I’ve never been able to describe it. So I used it as a co-author. And so that was good. I got a bunch of lenses we could use to say, what is a boundary? And these are different ways you can think about it. And I thought it was pretty cool.
And then I was like, but it’s just a subject area from the old 20, 30 years ago. It’s just a subject area, dimensional modeling subject area. So I was like, cool. Actually I don’t know where did that come from? So I asked chat gpt and said, who invented the term subject area
and it gave me back three authors. I can’t remember. It gave me Inman Kimball and somebody else. And that was good yep. I know them. I know they worked in that space back then, so makes sense. And I was like, cool. Give me the quote where they define it cause I wanna see whether they define it differently to the data domain.
And it gave me some sentences that looked a hundred percent real yep, I can imagine those people writing those things. And then I was like, cool where’d you find it? And it gave me their books. And I was like, excellent. Your reference back to the books. And it was early in the days I was using it and I went to publish that section and I went.
I probably need to check it and lucky enough, have access to O’Reilly, hopped on the electronic version, got the book Cetera, that text didn’t exist and that, that sentence was never said.
Shaun: Like you said, it sounded exactly like something that one of them would say, but how much of that do we already do? How many. of received wisdom, do we repeat out there? That sound like things, something that someone authoritative would’ve said, but it’s mostly the repeating of them that has made them authoritative.
Not any original source. So again, , maybe AI turbocharges this, you still remember to check your source and when you did, like anyone checking a source, you then took a human decision about credibility and made a particular decision.
Shane: I think what we’re gonna see is, I think we’re gonna see linking back to the authoritative source becomes more and more important when we use it to, to not help us, not augment us, but give us answers.
Shaun: And to make a real decision with consequences. Yeah. Yeah.
Shane: Okay. So we’ve got away, we’ve written out a little business case
we’ve got our big bucket of funding excellent. And now we want to chunk that down into smaller chunks cause we wanna be agile and we wanna deliver some value early rather than spend it all in three years and never get anywhere. And so we’re getting into prioritization I haven’t thought of a way we can use it to help us with prioritization.
Have you got.
Shaun: I think probably a brush clearing exercise to understand all the ways that people have done it might be useful. So again, all the, our ability to search and generate initial ideas. So people will have published different frameworks for prioritizing. So you could ask for a summary of those, or lots of different examples of how you prioritize.
But even requests for funding are more standard across companies than. How to prioritize the things that people are going to ask you to do or that you want to do. So I think beyond, give me ideas, it is a little hard to see. Actually, one, one thing I always struggled with was what are the questions or the line of questioning that you ask a stakeholder to extract the value or potential value of the thing they want without just putting what’s the value up front?
Because when I would just put that question there for stakeholders in, they would just say, if we don’t do this, the customer would churn and the contract is worth a million pounds. So perhaps if we think about it a bit more to aid that kind of. Discursive activity of discovering a line of questioning.
Maybe. Maybe there’s something out there that’s in there. This is impossible to know before you check, but maybe there’s a more effective line of questioning that uncovers the business value without asking that explicitly. Cuz I think that question or what’s the business value of what you’re asking me can be quite intimidating and off-putting to our stakeholders.
Shane: I do a lot of work on a thing called the Information Product Canvas. And it’s basically takes the business model canvas and uses it for data and information requirements, and it’s got 12 sections. And top left hand side of it is a box where we put in actions and outcomes
so if you you have a bunch of problems if we deliver the data or information to solve those problems for you, what are the actions you’re gonna take? And when you take those actions, what’s the outcome or value to your organization and then when we do it with the stakeholders, we describe, we’re gonna get all these information, product campuses, we’re gonna put them in on a table together.
And then a bunch of senior people are going to look across them and they’re gonna prioritize it. Which one do we do first? And they’re typically gonna look at this box. This is the box they’ll look at to make a decision, whether it’s information product A or information product B.
But when we do that, what we do is we find that stakeholders, they know what action and the output, the outcome they want to achieve, but they often find it hard to articulate. So what we do is there’s a box right below it called business question. So I always start off with, okay, what are the three to five business questions you want us to answer with data?
And people get it. It’s real easy I go, it normally starts with how many, how much, how long or the horrible why. And what you’ll find is they’ll give you three to five questions really simply it’s just bing bing. Cause they know them
Shaun: what they’ve asked other data teams or their own people over and over again. That’s why they know them. Yeah.
Shane: yep. And then we say, okay, so if we answer those questions, what are you gonna do? What action are you gonna take once you have the data why aren’t how many customers are about to leave the organization? What are you gonna do? Oh I’m gonna go and put some, save programs in place to stop them churning.
Okay? So if you take that action and the action’s successful, what happens? What’s the outcome? And we go back to it’s always, increase revenue, decrease cost, reduce risks. Yeah. Those kind of things. Okay, we stopped ’em leaving and then we make more money,
so there’s some value in that kind of templating or canvassing of that. I worry though somebody will use l M to populate it. So they won’t think about the questions or they won’t think about the action. What they’ll say is, I’ve got these questions and then the L EM’S gonna be great at going, oh, typically this is the action and this is the outcome you want.
And then imagine there was a repository it was trained on to say, and if you say these things, you are more likely to get prioritized over the poor risk manager because, we’re a marketing lead company about product lead growth and we’re not about risk, so I think there’s a really danger there that actually there’ll be times where we cut the human out and that’s actually a bad thing
Shaun: yes. I think the, all the cases of making yourself more productive it’s you, yourself not cutting yourself or anyone else out. It’s doubling something about your capabilities, but yeah, that what I take away from how you go from actions as outcomes to actually get the level of detail that you want.
And I bet those best business questions, Clarify the actions and outcomes. Ultimately, whoever’s building this thing is unlikely to be the one who owns the success or failure in the end. So that’s one link broken, right? Of accountability. And then yes if we just automate that question process and we don’t challenge people on why they want to do the things that they do in terms of competitive differentiation, if you’re just doing that based on everyone’s average answer of this is how they always save customers from churn, you will lose an edge.
You might feel like you won some internal battle faster and easier. But you will have lost in the long run cuz you’ll just be falling backwards into generic actions that are obvious. Anyway. So I think there’s a risk anytime it cuts out human to human interaction I think that’s the warning sign
so that’s where I would have, serious reservations. It’s hard enough to get those stakeholders in a room conversing. Honestly about how data could help them make those decisions. Why we would want to take all of the effort to get them in the room and get rid of it. The reason is it’s hard,
it’s hard. A lot of people who get into data would probably rather send people a prompt and just have them ask questions and then have requirements for dashboards put in a ticketing system for them. That’s just the honest truth, unfortunately, about a lot of people who start to work in data is cuz they’re quite introverted, they like to work in the black box.
People like you and me have fought for a long time against ourselves too, to get ourselves out in front of those stakeholders. And then so to use a machine to get away from your stakeholders again, would be a bit of an own goal,
Shane: And we’re not there to deliver data we’re there to help the organization achieve its outcome. we just happen to leverage data. Cause that’s our expertise okay so prioritization and understanding the action and the outcome and the value that, to me is still a human thing
that the models shouldn’t be used to generate that. And maybe it can be used augmented, but I don’t think it should be at the beginning. Maybe it can be used to be validated, but what we try and do then is we’re trying to build a model to figure out what the best answer is. Where actually it’s a linkage to our strategy.
And unless the model understands the strategy of the organization, it can’t actually figure out what the, the best thing to do is to achieve that strategy. So we go from there and we go into requirements and the idea of how do we do light requirements upfront before we get into deep requirements for build.
one of the things I spent ages thinking about when we apply agile ways of working and agile patents to a software engineering team it typically goes well when we apply it to a data team, it’s much harder. And there’s a bunch of reasons. The ones I talk about is with a software engineering team.
They’re in charge of how the data’s captured. When you’re a data team, you get given bad ingredients. So I say, software engineering team grows their own vegetables in their own food and it’s all beautiful, and then they cook a beautiful meal. We get a bunch of spoiled cabbages
Shaun: leftovers. Spoiled from the compost. Yeah. Yeah.
Shane: It’s obviously, it’s make that taste yummy and pretty
Shaun: make it taste exactly like the meal that they cooked or it needs to exactly match that part of that meal. You need to count exactly what I saw on that screen back in the actual application. Yeah.
Shane: But actually that’s giving me steak and I’m vegan. So , could you give me something that’s vegan version of steak because the business reality is I’m vegan not a meat eater. So there’s a whole lot difficulty around that. And one of the other patterns that we get in the data world is if you’re a software development team or you’re a product team, you tend to live with that product for many years
You start at the beginning or in the middle and you build it your job is to build that one thing, so you become subject matter experts.
Whereas if you’re in data space, we tend to switch in and out a lot. We go, priority is, let’s look at marketing oh, priority is inventory.
Oh, the next priority is manufacturing process or, risk or something like that. So we get moved in and outta these domains. And so again, I think the LMS are actually really good at giving data people an early intro into subject matters. They have no experience in it allows us to get the language, to get examples, to find the material and not become experts, but at least get that, that beginning of an understanding of that subject matter when we haven’t done it before.
Shaun: some smarter questions of the experts. Yeah. And so pe people I know are using it for that very thing if they need to prepare for a discussion in an industry that they’re not super familiar with and they want to get their head around a little bit of the jargon, it’s back to what you said about the funding.
Boil this down so that I can understand it. , literally what these models are designed to is to compress large amounts of content and we’ve chosen, or someone’s chosen to focus it on human language first to compress it and compression means losing information
but making a trade off about what you lose versus what you retain. And if you take what has been published about all the different ways data could improve marketing, that knowledge is out there. And if you can compress it down to the right size and shape that can help existing experts, true blue experts, and complete novices.
So I think for the data teams who are rotating around to try and solve that, longevity and intimacy with the problem that you outlined, just to be able to understand those stakeholders better, that’d be great. Now, again, same problem if that’s used to shortcut and avoid conversations.
Bad. If it’s used to more quickly engage with the customers of the data or information product about their true needs that’s a win for everyone, I think.
Shane: So augment don’t do and yeah, great example is I’m still a big fan of Beam, business event analysis and modeling this idea of core business processes. That’s still how I capture the earlier requirements around data. And so what I find is I’m gonna to work with a new customer and they’re in a domain or an industry that I’m not highly experienced with.
I’ll go into the l M and I’ll say to it, Hey, I’m talking about this thing. What are the core business processes? And I’ll get a bunch of language that I can go, okay, I can understand the who does what. I’m still gonna validate it with the customer or with the stakeholder.
I’m still gonna go, okay. Does that make sense? Are they gonna nod, are they not? But again, it gives me a quick start it gives me augmentation of my knowledge to help me be more valuable in that process because I’m binding it to something I already know I’m binding it to that.
Who does what process?
Shaun: So what’s different about that compared to reading a book or reading some blogs? What’s the value add?
Shane: It’s quicker. And, try and find a book that actually explains the who does what for an industry that you don’t work in,
Shaun: . The How to Be a Banker textbook won’t have that in the way that you need it, right? It won’t be compressed in the way that you need it to understand core business processes. And then if you approach it from an IT side, you’ll get a vendor specific or a tech specific view of the processes where you actually want the business process.
Shane: Which intrigues me because like you said, it’s taken a bunch of content and it’s compressed it. So when it gives me a bunch of core business processes, a bunch of who does what in an industry that I don’t understand and then I would talk to somebody in the industry and they nod,
Where’s it got it from? Because, it hasn’t picked up one of the 5 million enterprise data models that IBM and SAP and all those vendors tried to pedal years ago. There are some books around who does what, but there how to do it, not what the answers are.
Shaun: My answer would be language, right? Who, what, one of the core concepts that these models surely learn is entities in language. And so that’s probably the value add. It’s probably finding information that you could find on Google if you could. But the value add of a language model is it’s providing a way to interrogate that same information, but through the lens of core entities in our.
Concept of human language, which is, who human things that kind of look a bit like us and what that, these models understand what who is and what and how and when, and how those things are different. And then they also understand, where those concepts came from and therefore how to map back to and find when relevant information to, to summarize in some way.
But I think it’s that one of the early layers in their understanding is those core concepts.
Shane: And again, what we can do is we can use that idea of giving it multiple things and it telling us where the gaps are. A great one I use is I will take the core business question. Yeah, stand example I use when I do the training that I do is I talk about e-commerce store
cause we all understand that now this one’s not about shoes and this one’s not about pizza. I moved on from pizza to ice cream and then I moved on from ice cream to t-shirts cause it’s much easier from a manufacturing point of view to talk about manufacturing, printing t-shirts. And the reason I mentioned that is because I’d say we worked together, but Sean and Kat did most of the work on building a course around a pizza shop, which was one of the best courses I’ve ever had the privilege of running.
And so what I’ll do is I’ll take the the key questions from the stakeholders. So maybe it’s I wanna know how many orders have been done. I wanna know average time to pay. I wanna understand the average delivery time, how many missed shipments we’ve done. And I wanna understand how many times the product’s been returned,
so those are the core business questions. And we’ll do who does what? And we’ll go through and we’ll quickly go, customer orders, product, customer pays for order or do they pay for product warehouse or store ships order. Ship product, we do those questions. You put both of those in and you say to, to the l m tell me what I’m missing.
And it’s gonna pick up that actually you don’t have anything about a return. There is no customer returns product or returns order. Because it can look at it and go, Hey, there’s some words over here, and I haven’t seen the equivalent words over here. And so again, as a health check about, okay, now I’ve got a bunch of business questions and I haven’t got the core business processes or the core data to understand that.
So again, as a check your homework, it’s actually really useful.
Shaun: With the outcome of forcing you back to talk to someone to understand, is this an important process? How does this process run? Where could that data be? The more we use it in this whole, before anyone’s touched a keyboard kind of phase, the more we use this to get up to speed and have a valuable conversation faster, it’s all win-win as far as I’m
Shane: Yep. And part of the problem when we decentralize our teams we decentralize our knowledge and yes, we get a catalog maybe, but we tend to buy those big ones that never get populated, right? Because our D B T instances divorce from our relation catalog and the parole data stewards haven’t managed to get through the 17,000 SAP tables.
We don’t use to catalog them. But if we think about it, we know that chat G p t gives us an interface that’s magical part of the reason it’s been so successful
Shaun: Yeah, it’s the interface. Yeah.
Shane: what do you do? You go there and you log on. you put your credit card in, it takes your money, and then you have a window where you just type a question.
There’s no training anything. There’s no drag and drop the interface is just simplistic, like an iPhone. So imagine if we took our catalog or our metadata of all the core business processes and all the data we’ve already got, and when we get asked those questions by our stakeholders, we just go into it and say, can I answer this question with the data I’ve already got? Because nine times outta 10 in a large organization, that data already exists ? We just tend to reinvent the wheel. So again, we can think about it to augment to reduce the effort in many ways. And we have to be careful because it does hallucinate. It gives us the wrong answer
Shaun: It makes stuff up just like humans do. Yeah. It’s funny that we call them hallucinations when humans also make stuff up at some baseline rate, right? And we don’t call that hallucinating. I’m just wondering are you, in that example of you’re discovering, do we already have the data?
What would you need to tell the l m? What would
train it on or point it towards?
Shane: . We’d have to tokenize our catalog, our repository and give it access.
Shaun: In recent talks on stages at Garner London, talking about how not to waste the moment, if you’re a data person, how do you use this increase in just general public interest in ai and a momentary. Suspension of some level of fear.
So yes, there’s lots of fears and questions and risks, but there’s been a shift I think that cause of those interfaces because of just how useful it seems to be to just normal people not data people. There’s a moment , that data people need to not waste. And I don’t have the answers on what anyone should do in their business.
But I can think back to all the things that I’ve done. And one of the things that I’ve done is stare at entity relationship diagrams, stare at the output of data catalogs, stare just into the abyss. asking your question, I’m sure this data exists and someone has answered it before, but there’s no feasible human-centric way to discover that.
There’s no person I can ask who will even lead me down a series of asking people where I believe that the answer at the end will be, oh, and this data is here. And so one of the things I suggest that it would be great if LLMs do is even without sharing data, just sharing schemas . So again, in a year when there’s a lot more specialized LLMs available, hopefully with more clear open source licensing or, ways that just make it cheaper and easier for you to do it in your own it estate.
In your example of the SAP schemas anyone who’s got, one SAP has got many SAPs they might have 40 factories or with a tiny amount of data. But more importantly, the schema information. I just feel someone’s gonna do something really cool. , at the very least, a data professional to ask some natural language questions about where data might be hiding.
Where could this be hiding? What have I missed? Where have I not looked? And the ability of a language model to abstract from all of the terrible detail of all those entity relationship diagrams, right? And it’s been done in a rules-based way. Like lots of people have written a script that scrapes the meta, but it’s the ability to jump from table to table and to use an understanding of language and these arcane column names
and to reason in a way from order underscore products to oh, that’s the same or similar as this other thing. That would be the breakthrough, I think to use human language to unscrew up the way that would be naming columns in tables, for example. And just find those potentially useful joints, much easier.
Shane: . And we see that, , we see a bunch of vendors in play at the moment, bringing the LLMs into their products. We see a bunch of new vendors start up which are generative AI companies cuz it’s the new wave of VC money. And we see some typical use cases. We see the text to sequel,
let me ask a question, let me get back and answer. The problem is and you’ll probably know a better way of explaining this than me, but the models are not deterministic. So what that means is I can ask it the same question 10 times and I’m gonna get different answers. So I can say how many customers we got and it’s gonna go 42,
Then I might ask it three or four more times.
And it’s gonna go 42, and then the seventh time it’s gonna go 350,000.
And you don’t know which number’s because you’ve got no proof. We are seeing the catalog vendors going, we know that pain is actually filling out in augmenting the columns and the descriptions. So they’re using it to go here’s a column name.
Just write me a description. Now, is it a correct description or not?
Who knows? Does filling out that white space have any more value? I actually say it introduces a shit ton more risk because somebody’s gonna believe it.
Shaun: Someone’s gonna believe it. And it’s that balance between, okay, unpopulated data does no harm in a certain particular way. If you put something in there that someone believes for the wrong reason that might happen in relatively few cases. I think a lot of this kind of filling in the gaps stuff you really need to ask why is there a gap?
And go back to your question. If we filled in this gap, what would change in the world? Would anything change? Would more people use this? I think the sad truth is that a lot of the low hanging fruit, easy use cases at the top of any enterprises generative AI task force use case list, right now it’s, get rid of the call center.
It’s automate things that maybe don’t have already been automated almost completely or highly optimized or it’s this, let’s use it to fill in some text content we can’t be bothered writing it, but why would you write it if no one read it? That’s what I’m trying to help anyone who asked me and including our customers understand, great, there’s things you want to do out to your business.
Think first about what we did. Start talking about how you get the job done how do you become more effective? How do you provide a better service? And so in the text to sequel, I think it’s perfectly fine to say, write me a query that counts the number of whatever and summarizes it by whatever, if that’s the starting point to help you, get code and augment it quickly.
But imagine even if the answer was deterministic what if you could use text and get 42 every time you asked? Is that valuable to me? If that was valuable, then everyone would use Thought Spot and not anything else and the conversational features in Tableau and Power BI would’ve gained a lot more adoption.
I think it’s really cool for a wow effect. I just don’t know how conversations about a single number. Add business value necessarily.
Shane: We know that the first question is just the intro we know that, how many customers got 42? Okay, where are they, what are they buying? We know that the questions come on, but I go back to that iPhone scenario. So I was with you
I was going, yeah, this is a great toy, I can ask some questions of data. We did some prototyping in our product and I liken it to, I use chat g p t to help me write content because it just makes it more fun and it does a better job. I do a better job with it than I do without it. And so the scenario I use and it’s still fiberglass me, is that, yeah, we have some data in the product. And one of the things, when I get the data the first time and if that data is a piece of data I need to use, I need to understand how many nulls if it’s a key field, I need to go. Are there any nulls?
Cause nulls are dangerous in a key field. And like all data tools your data comes in and it gets profiled. I can go on a screen, I can see the columns, I can see 10 rows of every column. And I’ve got an indicator that tells me the percentage of nulls in that field
so it, it’s easy. Data’s in there, in the catalog. Go into the detail, see the thing. 30% nulls, okay, that’s dangerous so it’s three clicks, cuz we wrote it. It’s beautiful. It’s so simple. It’s so easy. We put a connector into the Chat GT api and effectively I can go into that tile and I can say how many nulls are there in the customer ID column?
Or just in customer id. And so it then sends the schema off the chat gpt it comes back with a sequel. It runs the sequel. It gives me the answer. Now, for some reason, I do that far more often than three clicks.
Shaun: I was gonna ask you, but you already got the deterministic answer in three clicks.
Shaun: Okay. Interesting.
Shane: And it’s the iPhone moment why do I go and do that when I’ve gotta write more text.
Shaun: Yeah, it’s more physical actions. And I guess the answer is the way we think about it, the way we think about the world is closer to how we talk about the world than how we force ourselves into even very elegant interfaces on, on, on computers.
Shane: But there is a second reason. And the second reason is when I go and do the three clicks and figure out the percentage that of nulls, we haven’t built the feature. When I can then filter it and say, show me some example records of those nulls. just haven’t built that feature yet.
Shane: But when I’m in the chat, I go, how many nos are there?
It goes 30%. I say, show me 10 of them, and then it comes back and gives me the query. And I haven’t had to build that.
And so I can, sometimes I can ask the second and third question and get an answer without us having to build that into the product. And for me, that’s where it
Shaun: When it’s about the second, third question that’s really valuable, and I think that’s where conversation and a line of questioning and an answer that generates a question it’s even just saying that it’s clear why a language model is really good at understanding that interplay between question and answer. I’m really keen to hear in the future how much business users adopt interfaces. I have a hypothesis that. Maybe 10% of them are actually secretly data people and so they already know the second, third, fourth question, and they’re gonna refine those by answering the first questions.
I do think that a lot of what businesses consume from all the incredible amount of work that we’ve been talking about so far today, is that one number. And then they go, huh? And then they move on. And so for me, as a data person and working in a product company I’m super interested in how those interfaces can , increase that percentage
encourage more people to ask the second question. And then the third question encourage more people to ask why is it 42 rather than 42? Oh, I like 42. I’m typing 42 into this email and sending it off and never thinking about that again. And I think in terms of interfaces for the business and then, even the best dashboards, what have they truly delivered in terms of the ability to ask a set of questions?
I think that’s why notebooks took off so much for data practitioners of certain flavors is you’re having a conversation with your data in a notebook it’s clumsy, you try and productionize it, maybe that’s not a good idea, but that idea that you can trace back to where stuff came from and see the trail of the conversation is a powerful concept for anyone who genuinely wants to engage with their data.
I wouldn’t want anyone to be over-engineering stuff. For the people who actually don’t care about it, they’re just looking for a number to support their argument. Then I think the three clicks and you get the number is fine. But providing really easy ways for more people to truly engage with deeper questions I think is a bit of a final frontier,
Shane: One of the things we do now is whenever we deliver a piece of data information, we track when it’s used blunt metric, but how many times has the person who asked for it or their team queried it? So maybe that’s part of it is when we get really strong business questions in the business question section of the canvas, but we get really weak actions or outcomes.
Or they look like they’ve been generated by chat G p t. We go, that’s fine. We’ll just give you a chat bot. You ask a question, it’ll give you a number. You’re not gonna take any action or do any outcome anyway. So who cares? We’ve saved a shit ton of money of us developing the data. And again, that was sarcasm.
But what’s interesting is that everybody’s facing outwards. They’re looking at stakeholders and business people and saying, how can we do it for them? I look at it internally I go how do I take this work that’s drossy that I don’t like doing? Or I find hard? So an example you can take the schema of Shopify, just a dictionary, and you can give it to chat gpt and you can say, where’s the schema from?
And it will come back and tell you at Shopify that saves a whole lot of knowledge if you can pass it a schema and say, have you seen this before? Yeah that’s from there. Then you can say, okay what are the core entities? It’s gonna come back and tell you these customer and order line
and you can go okay, if I wanted to, figure out all the products that were ordered how do I do it right? It’s gonna come back and give you a hint. Now you’ve gotta test it and make sure it works. But again, it knows what that model looks like. It knows a whole lot of stuff. So it’s gonna save you some time.
It’s gonna give you a massive amount of hints
Shaun: Compared to working forward from the docs to build all of that yourself. And that’s what gave me that idea of, in these talks in front of hundreds of data people and trying to shake them a little bit and make them realize that this can be such a boon for them. Because so much of what the millions of data people in the world are doing is much more drudgery than the things that they or their stakeholders may want to automate or augment out in the business.
And I go back to that first job I had I made myself obsolete on that particular task by learning how to do it a different way. And people just kept giving me harder questions. No one said, oh, thanks for writing that VBA code that automatically form formats those spreadsheets so we can send them to a publisher to print the spreadsheets in books of numbers of statistical output.
This is 20 years ago, obviously. They said, oh, can you do this for the rest of them? Oh, when can you do this? And can you do this? And I think that’s why , I’m really excited for what other product innovations that people do. How do people put generative AI stuff into products that exist?
But I’m also just super keen to see how that stuff that falls between products, that job you were just describing here is schema. Where on earth is it from? No product is gonna. Tell you where that is or help with that. That same kind of line of reasoning is what inspired the look across all the enterprise schemas across any big multinational conglomerate
just the schemas contains so much useful information about what data might be out there, what data gets used, what data reappears later on. And it doesn’t require detailed tracking of lineage or anything. So if you could replicate that Spotify example inside an enterprise, and then give that to every data person just as a little interface, paste in some data paste in’s schema, ideally paste in a schema and some data from literally anywhere, and have a machine take some guesses as to where it might come from.
That would cut out sometimes weeks of work looking for where on earth did this come from, because I, it just appeared in a table that I have access to. And to find out where it might’ve come from, could take weeks. If that just closes that connection time and again, encourages that conversation to happen faster between data people, it people, that’s a win for everyone.
I really hope data people get creative as creatives are getting and turn it on themselves before they turn it on. Everyone else that’ll also help them understand the risks,
Shane: It’s about automating the work we do so we have more time to do the work we don’t do and also maybe adding value to the work we never do. And what I mean by that is, before we went down , the modern data Jingga stack and we managed to somehow do the
Shaun: semantic layers. Semantic layers are cool again, as
Shane: they are because you need a semantic layer for your l lm, but the question is it a headless bi semantic layer or is it a DBT semantic layer, or is it a data catalog seman layer? That’s the semantic layer to rule them all. That’s the
Shaun: been amazing. As someone who’s, always been in dirty enterprise data, to just experience the time shifted reinvention of wouldn’t it be good if we had a consistent way of defining how to calculate X, Y, Z. I’m like, yeah, that would be cool.
Shane: We saw it with Tableau where we had this beautiful tool that gave our self service so people in the organization could serve themselves and do great content. And then we got self service sprawl, complete decentralization and no standardization, and 16 answers to the same question.
And I thought we were going to solve that problem in this wave. But what we did was we ended up with dbt, which says, ah, let’s do the same thing to our analysts and let them become data engineers and write 5,000 DBT models. Don’t get
Shaun: Yeah. Yeah.
Shane: model being actually just the blob of code.
And so we get the value of decentralization and network. We’re quicker and more people can do the work but we lose the rigor and the standardization. We are not seeing a lot of metadata or config driven data tools yet. I think we will start to see them.
And we’ve had the problem with the catalog always being separate. We spend all this time populating the catalog on the theory, somebody’s gonna go and view it but maybe now’s the time where if we put the L LM over that metadata over that config, over that extra descriptive context, we actually open that, the value of that data up to a lot more people
Shane: outside teams.
Shaun: there is valuable, actionable knowledge hidden in there. Ultimately, it takes people with years and decades of experience like you and me to look at tiny fragments of that information and even know where to go. So if language models can. Learn some of those hidden meanings and surface the hidden meanings to the people without those battle scars. That would be amazing. That would just be phenomenal.
Shane: When I’m coaching teams, what I talk about is we get to that, oh, we’re doing agile, so we don’t do documentation and the answer to which is always bullshit. Agile doesn’t mean no documentation, it just means we only have documentation where it has value.
So I’ll typically get the teams to focus on, okay, if we had that piece of documentation, what’s the action you’re gonna take? Who’s gonna use it? And what action are they gonna take? And if that action is valuable, write the documentation.
It’s the same as augmenting our data with context. If we’ve got a column with a shitty title, a W three X five nine as the column heading, I know what that means cuz I’ve done the hard work, so I don’t need to augment it.
You don’t know what it means, but if I just put in, a proper name, are you really gonna find it? Nah, you’re gonna come and ask me, so why would I go to that effort? But if the l m’s gonna use that language to allow other people to find it quicker, then I’m probably gonna go to the effort of typing the pretty name or the full English name into that column as an alias, because I know there’s some action that’s got value after it.
Shaun: . And it has a direct benefit to you. Fewer people will come to you asking annoying questions, and it’s about giving up knowledge that you have. So I see why you would do that. I see why that would be incentive compatible for you to do that. It will be interesting to see the adoption by data people of those things because some data people you are sitting down Don’t want to give up that sacred knowledge, that secret knowledge that is the source of what they perceive as their job security or their ego.
That only I know what W three x five nine means, and you have to come to me and then I’ll give you a big lecture about what it means. And if you can last that lecture, then I might give you a CSV with a little bit of that data. We’ve all met DBAs like that. And unfortunately, , there are still lots of data.
People who and it’s tempting to say it’s a generational thing, but every generation has a new set of cool jargon to to rely on . To exclude others. So it will be very interesting to see adoption patterns, which parts of the data community push for that which people see the value of giving up their knowledge
giving up. Knowledge, again, go back to my example, I could have kept completely secret that the tricks that I used and had a very nice holiday job doing the same thing, for a very long time. Just because of my personality. I was immediately bored doing that over and over again. I didn’t give up my knowledge to that organization out of the goodness of my heart.
I gave it up cuz there was a direct personal benefit to me. But whatever reason, it will be very interesting to see . Who gets it, that this technology might allow us to let go of some of the things we thought we had to hold onto to protect our position in order to do something else.
Shane: It’s gonna be interesting though because knowledge is power and one of the things that the LMS did very well is they got access to all that group knowledge at relatively low or no cost. We talk about the idea of copywriting and ip. There’s a bunch of content. Which people spent a large amount of time learning and writing and sharing, and typically we paid for it in the past, and now it’s just freely available, for 30 bucks a month.
And so that’s gonna get solved they can’t be sustainable. People are gonna start pay walling, their content. There’s gonna be a whole lot of behaviors that come out of this new wave. And it’s gonna be the same in the organizations that if we don’t figure out how to incent the people that provide that context, then we’re gonna get that knowledge held back again because they know it has value.
. But we can gamify it if like Google Bard, if we can see where it’s getting the answer from.
All you gotta do is when I enter in that piece of context, make sure my name’s tagged on it, and then tell me how many times my content’s been used by the model to help people do their job and then, and send me micro and send me for being, the biggest winner of context or augmentation or whatever.
Pay me or make it worth my while in terms of money or status
Shaun: People will do a lot for a laptop sticker, Databricks train that dolly model. The second one, the one that escaped the licensing difficulty that the first one had by just crowdsourcing question and answer pairs inside Databricks and across a large number of people, you don’t need that much reward to incentivize the right kind of competition.
But things could get really interesting a lot of customers are asking, how can I put an L LM across all my IT documentation or all my corporate knowledge bases and everything? I see why they want to do that, to get the chat G b t well effect for something that they delivered and, that would be good for us to be alongside them while they win that race.
To put the flag on the moon, just going all the way back to the start about funding. What if the sum of enterprise knowledge confronted by an L M actually uncovers? What did we miss? What have the board not thought about? What will it do to corporate hierarchies could be very interesting because at the moment the non decisions or the hidden decisions or the UN decisions or the bad decisions tend to get squirreled away.
But even the absence of something in that big enterprise field of knowledge be something that’s detectable by a large language model comparing even just the internet to your private data, right? So nothing needs to be shared back to open AI here to create quite an internal.
Scandal, potentially. Now, that could be really great for organizations. It could lead to some change though, as well.
Shane: We tokenized our documentation and our product as a mixed spike to see what happened. And the same thing,. You’d say to her how can I delete a tile? Nine times outta 10, it would say, go to the tile catalog screen, see the tile, click on the three dots, delete the top
One times outta 10, it goes, go to a screen that doesn’t exist.
Look for the big red delete button at the top it, click it. And it was like and I’m a great fan of Hitchhiker’s Guide to the Galaxy. We’ll come back to the why that’s relevant in a second. On the other podcast, I do the no nonsense Agile one. We’ve had two people from Spotify come on it to talk about the Spotify model and the Spotify way of working.
And what’s intriguing is they both made it very clear that the reason those things were successful was because it was their strategy, it was their culture, it was their matrix models and the way they, did the teams and the guilds and all that. That was just how they articulated it.
But actually it was the alignment with their strategy and their culture is why it worked. And if we come back to the answer to life love and the universe is 42, and taking your point, what happens if we did tokenize that whole corporate knowledge base? All our financial statements for the last 20 years.
All our salaries, all our hiring and firing, all our sexual grievance claims, all our customer complaints. And we go to the LM and go, Hey, what should we do next year as our strategy? And it comes back and goes, no, you are rigg.
Shaun: Shut the company down.
Shane: Yeah. The answer is, got nowhere to go. Fire everybody. Start again. That’s
Shaun: Yeah. It might be honest in ways that we don’t. Expect or want.
Shane: Ah, but the good thing is we can say, oh, it’s hallucinating.
We bring in the shiny, so consulting company who tells us what we wanna
Shaun: where you just re reread the disclaimer on the bottom. Anything you get out of this might not be real. So just be careful. . Interesting.
Shane: Yeah. Exciting time. So for me I think we’ve got an iPhone moment. I think we’ll buzz wash it for a while, as we always do, as an industry, but I think we’ll come back to the core. The core has massive value. And I think for me the key is as data professionals and analytics professionals, we should look at the tasks that we do that take us time or take us cognition, and we could see how these tools make us faster, make us better before we even start to think about how we can force it onto our stakeholders
Shaun: Yeah, and I think the stakeholders. My sense is from talking to these chief data officers yesterday, the stakeholders are knocking on the door. The stakeholders want something, right? Maybe not always for the right reasons. I don’t think demand is gonna be the problem in the long run. Because people are gonna start using this in their personal lives and they’re gonna start reading press releases about how the competitor has used the same kind of that is gonna be unstoppable.
So I just add to what you said, augment what you said and say, use the time that we have now to do that. Experimentation on yourself to let the right demand in, as it comes. , unlike, please, can you use my dashboard, which just does what ? 50 of you said you wanted a dashboard that answered these five questions.
We built it and now none of you use it. I don’t think we’re gonna have the same problem with, conversational tools necessarily, but all the risks and how to do it right and how to make it valuable from the get go. Use the time now to, get your eye in on your own work to get practice.
Cuz it is a new thing. And it’s gonna shift paradigms. And to your point on iPhone moment, yes, the iPhone created a set of devices that we now take for granted. But it wasn’t just that. And for me, yeah, the devices were cool, but it’s the things they enabled. It’s the second order.
Benefits of now having a different way of interfacing with content with. Telephones with the internet that has made a whole bunch of things, different and many of them better. And so that’s what I think people need to take away from the moment and from your analogy is, yes, there’s the cool but then there’s the innovation that will happen cause of something becoming easier, which is impossible to foresee.
When you make something easier, what other things will become easier and more valuable? And that’s what we need to push forward to, and start with yourself. I think is always a good mantra.
Shane: And watch out for the negative. Now that the human race loves to sit at a coffee table, in a coffee shop and ignore each other by being on our phones rather than talking to each other, which is the downside of the i five
Shaun: exactly. Yeah.
Shane: We’re gonna get the downside of oms.
Shaun: . I remember my dad telling me in the nineties, he changed company . The company he went to, people who sat at the next desk were emailing each other cuz email was new and email was cool and he put a ban on emailing someone who is, fewer than 10 meters away from you.
So we do have a way of . Reinventing that disconnection. And there’s some really interesting . People thinking and publishing about that and that’s something to be mindful of. But in our professional lives,. Returning to that note that we came back to many times, any usage of this that helps us get to those valuable human conversations earlier. And easier is good as long as we have them. If we just use it to avoid, to fall further into our shells, and to automate the things that create value, especially value over your competition value, that’s different. We will just sync into some very average version of what the internet thinks about how to change your product, for example.
And then you might implement the same interface everyone else has, and now you’re no different than any anyone else. And so is a serious risk of retreating into ourselves that I think, no one has the answer to that, but maybe data teams always say they have too little time to go and talk to their stakeholders meaningfully.
What if they got rid of a whole bunch of, lower value stuff that still needs to be done? My hope, my bet is that at least some of them will use that time to get out on the front lines and have those conversations.
Shane: The successful ones will, and
Shaun: The successful ones will. Exactly. Yeah. It’s gonna be an evolutionary process.
Shane: And we’ve gotta remember as data analytics professionals, the only time we deliver something from the beginning to the start and a day is when it’s a demo from a vendor with highly curated data and analytical models that they’ve seen before and built five times.
After that, when we hit data in the wild and analytical models in the wild, it’s always, two weeks to six months before we get something useful in front of our stakeholders, that’s a long time to wait for your meal to turn up to your table. Regardless of how often we blame the crap ingredients.
So why don’t we just start serving them a little bit faster with these tools before we start serving them fancy meals?
Shaun: Yeah. . Speed is, you listen to any, anyone, speed and time. The technology exists now. The data is there in these organizations. The ways to process it. The, and a lot of the talent, the human talent exists. It’s just getting out of each other’s way and delivering that small thing that is valuable.
And the timescales I still hear from customers are, yeah, not acceptable if you’re the on the other end of that. I think explaining, data, people have a tendency to then, when those stakeholders don’t like that time estimate, explain all the ways it’s complicated. What you’re already starting to see on LinkedIn and elsewhere is. Stories of people who can’t wait just going around the central data team or IT, or whoever, and just finding someone who can give them the answer faster. And so far in my big conference presentations on this, every time I ask is anyone here is a colleague copy pasted any corporate data into chat G P T yet, at least one hand is already going up.
Now in a year’s time, it’s gonna be a lot. There’s a selfish reason to apply this technology to ourselves. The other thing is to prevent extinction if someone’s going to connect church, e p t, like interface to someone’s Shopify. Every Shopify user, the chance to ask natural language questions about business performance. don’t wanna be the data person who discovers that you’re redundant cuz of that, cuz you weren’t the one that made it happen. But if you are the one that made it happen and you made it happen safely, you will get another job
and that’s the great risk I think as this becomes commoditized as more softwares and service products put it into their products. And then as it starts to augment tools that people use out in the business, if data people don’t keep up with that, a lot of stuff, a lot of stuff they do will just become redundant.
They’ll just be outmaneuvered by the business with some really powerful thing. And it’ll be too late. Once you, once they tell you that it’s happening, it will be too late. Every data person, I think needs to educate themselves on what these things really are and how they can use them in their own job.
Then they’ll be fine. They’ll be on the leading edge of this or the first safe trailing edge after the leading edge, which is a great place to be. Make all the most of the innovation that other people have done and spent billions or trillions on is a great place to be. But if you just let it wash over you and say I’ve used sequel and Tableau for my whole career, and that’s gonna be it till the end.
I dunno if it’s next year, but within five years, might be awkward
Shane: you don’t wanna be the equivalent of the yellow Pages, where the only value you have is propping up the the monitor
Shaun: And unfortunately, , a lot of people at enterprises are paid for the organizational equivalent of property, of a monitor. But again, that’s a dying game. the job is clear. Use it on yourself to understand it, the risks, opportunities, and go out and sell it. Be the one going to the business saying, I’m using this to do this thing.
That from what I understand, that part of my job is a bit like this part of your job. Can we work together to build a product for you internally that leverages this safely to, name your process. Almost any process could be augmented by this. And it’s just about finding the people who are open to it.
But if you go to them saying, I’ve done this to myself and this is what it felt like, and this is the risk and this is how I manage that, that approach from data people out to the business is still very infrequently done.
Even you and I with lots of experience talked about questions and canvases and filling out forms more, more sophisticated forms, but it’s still about capturing requirements rather than going out and selling a capability.
Saying this could change how you do marketing. So it could be a really exciting time, if data teams can become product teams internally, what a world that would be then that would in turn allow data teams to have that long association with a product that we sometimes don’t get as you raised at the start.
And those would be a great data organizations to work with. That just became product organizations. Making stuff cheap, safe, easy to be adopted, rather than pushing data out through various pipes into people’s eye eyeballs. They could be a really different model coming around the corner.
Shane: Sounds almost like bringing back the role of a business analyst where instead of being the Jira Ticket Master they actually look at the organizational processes and and ways we could change that process to make the organization more efficient. Make more money, spend less money, reduce risk.
Shaun: What you’re saying is the nineties were a brilliant time, and now we finally maybe have the tech that could deliver some of the things. That took a little long there. I’m up for that, but yeah, return of the data model, return of the business analyst,
Shane: I love t-shirts so I’ve done return of the data model. I can’t decide. And maybe you can help me here. Is it the business analyst strikes back or the semantic layer strikes back? I can’t decide what the third t-shirt in the series is
Think about it. Let me know.
Shaun: I think the business analyst has a clearer hero who’s been through a difficult time and had to do some things they didn’t like, and then they. Come out and they slay something what do they slay? Maybe they slay the semantic layer.
Semantic layer is just \ . Fascinating. When the same words to describe the same thing
Shane: People over technology every time that’s what we should do. Excellent. Alright, I reckon June, 2024, we should get back together
Shaun: I’ve said a bunch of times that in a year something. So hold me to those predictions. I think to boil them down, it’d be interesting to see how much data teams have adopted it. Interesting to see which enterprise use cases didn’t work because it’s let’s do it on the call center.
But the call center’s been optimized like hell for four decades, right? So I’m not anticipating actually a ton more gain there who’s got the horror stories? And then my main hope is that in a year, there are a number of different marketplaces where different kinds of LLMs are available, right?
For different purposes available under a range of licensing and a range of payment options from completely open source. Put it on your own stuff and do whatever you want to. Here is an API and we’ll take your data and free and paid and obviously those things will be correlated
but that’s what I’m super keen to see is that instead of just everything being about chat, g b t, that chat, g b t opens the minds. And then there’s all kinds of specialist models that then allow data teams to stitch together different components and build really compelling user experiences that are, beyond what we could ever imagine a dashboard being in terms of helping people take action.
I think. Everything being chatt, G B T is not quite the way to do that. And so the rate of innovation already of people catching up to ch chatt, bt on specific tasks, if that sort of expands and then there’s lots of ways to buy and sell those models, that would be and maybe some standards emerge.
That’d be good too. That, that would be a really interesting place to be in a year.
Shane: And also that there’s micro paywalls that where people are creating the content that gets harvested by those models get compensated for their knowledge.
Shaun: See that loop closed.
Shane: But I doubt that one’s gonna happen. Excellent. Alright. June next year we’ll do it. In the meantime, if people want to get ahold of you, what’s the best way for people to follow your thoughts, your writing, your speaking?
Where do they find you?
Shaun: LinkedIn is probably the best place I used to be on Twitter, but. The year 2016 killed it a bit for me. And then everything since, so I put most things on LinkedIn. You can also Google my name, Sean Mcg and Data Aku, and you’ll find what I’ve written on the blog. And then increasingly
a benefit of the virtual period of conferences during pandemics is that more of the content is now being filmed and then released. So a number of my talks are out there on the internet as well.
Shane: Excellent. All right. Thanks for being on the show and we hope everybody has a simply magical day.
Shaun: Thanks for having me.