Data Engineering Patterns with Chris Gambill

Jun 13, 2025 | AgileData Podcast, Podcast

Guests

Resources

https://www.youtube.com/@GambillDataEngineering

Join Shane Gibson as he chats with Chris Gambill about a number of Data Engineering patterns.

Listen on your favourite Podcast Platform

Podcast Transcript

Read along you will

Shane: Welcome to the Agile Data Podcast. I’m Shane Gibson.

Chris: I’m Chris Gambill.

Shane: Hey, Chris. Thanks for coming on the show. Today we’re going to talk about dain engineering patterns. Before we rip into that, why don’t you give a bit of a background about yourself to the audience?

Chris: Absolutely. So I have spent the last about 25 years in the data world.

I actually got my start in a call center back in 2000

where I quickly became the kind of go-to person for untangling all these Excel spreadsheet nightmares and all these manual processes. And they figured out that I had this snack for automation and data, and I quickly got pushed into a position. That was what I did all the time, and since then it’s really taken off from there.

I’ve worn a lot of different hats from individual contributor all the way through director positions was brought up in the Microsoft stack, so a lot of DTS packages early on. SSIS. Moving into Azure and a DF and Synapse and fabric now and then even added opportunities to touch Snowflake and Databricks and, and even do some migrations from AWS to Azure and back again.

A lot of kind of breadth of knowledge as well as really. Deep knowledge with the Microsoft stack. Most of my knowledge has been self-taught and really tested on real world projects. Right? And then today I run gamble data where I help organizations design scalable modern data solutions, automate reporting, and really bring a lot of order to the data chaos that you see out there.

Right? And I’m also recently proud to say that I am now a Microsoft partner as well. Super excited about that. And that really lets me to stay ahead of the new technologies, ’cause I get some advanced views of different things that are coming out from Microsoft. Super excited about that.

Shane: Excellent. Hey, at some stage you might take a sidetrack into is fabric more than just PowerPoint slides with synapse renamed?

But before we do that and we get into the sarcasm from me, why don’t we jump into the core of the subjects? I suppose if I put some anchoring in place, the way I think about patterns, and I get it out of the. Patent book that came out about buildings, I think about them as solutions for common problems which fit a certain context.

So I can look at the way people submit code to get and say, Hey, that’s a repeatable pattern. I can look at the way people peer program and say, Hey, that’s repeatable pattern. I can look at the five ceremonies of Scrum and say, Hey, that’s a repeatable pattern. I can look at a four tier data architecture and say that’s repeatable patent, because each one of them are a solution to a common problem.

And given your context, it may fit, it may be valuable, or it may actually be an anti-patent. So let’s use that as the anchoring and what’s one pattern that kind of comes straight to mind in the work that you do?

Chris: Yeah, it, so really when I got a chance to work with AWS, this is one of my favorite patterns out there, partially because I think that this pattern gives you a lot of control as to what’s going on with your whole ETL slash ELT processor.

And that pattern is to write a Python script to do all of your extracting, your transformations and your loading. Take that, containerize it using Docker, create that environment and docker and really containerize it and then take that. And AWS has some wonderful elastic resources where you could load it into an elastic repository in AWS.

And then from there, in this example, I generally would use Fargate and I’d run them in the Elastic Services and Fargate. And then we orchestrated that with kind of crime schedules. ’cause most of that was batch scheduling. And that lends itself well to batch scheduling. That you’re gonna do maybe once or twice a day, bigger processes, because if you’re doing anything more.

Often, right, like 15 minute low every 15 minutes, or if you’re doing any type of streaming, probably need to go the Lambda route. But with Fargate and this particular pattern, it lends itself to that. Otherwise, the streaming, you go a little bit different direction because cost can be high with Fargate.

But Fargate does really well with this specific pattern.

Shane: And the thing that I like about the way you described that is I can think of them as Lego blocks. So there’s a bunch of Lego blocks and you’ve racked them, stack them together, and then you’ve described the anti-pattern. So if you are doing batching, if you’re doing 15 minutes or more is the frequency of loads, then this is gonna work.

But if you go into anything below 15 minutes, definitely near real time, but even if you went down to a five minute load, you’re gonna have a whole lot of things you need to deal with. And this may not be the best pattern for that. It would be a more streaming centric type of pattern. Is that how you think about it?

Chris: Absolutely. Absolutely. Yeah. ’cause then you need to probably go the Kafka route or like I said, Lambdas

Shane: and go that direction. And then often these patterns within patterns. So for example, when you talk about this extract and load pattern you’ve got, and you talk about effectively your containerized it.

Then I’d be looking and saying, okay, are those containers running 24 by seven? Or have you adopted a form of a serverless pattern? Are you effectively deploying the container, running it, destroying it, deploying it, running it, destroying it? So there, there’s always patterns within patterns, which make it a little bit trickier.

But what’s your common one for this? Do you tend to run that container permanently or do you adopt a deploy and destroy kind of patent

Chris: in these cases with the batch loading, which usually adopt and destroy. Right? It’s usually you spin it up and then you’re killing it at the end. ’cause that Fargate instance guys at the end of the run anyways.

So it’s a spin up and then burn down at the end.

Shane: And then the other one that’s interesting for me is that you pushing it to elastic. Mm-hmm. So that’s elastic search. Yeah.

Chris: Not Elasticsearch. So Elasticsearch is a little different. So in AWS, there’s a couple of resources called it’s ECS and ECR, right? One is the repository, obviously the ECR piece, and that’s where all your can containers get stored.

And then you have the elastic services, and that is where all the scheduling happens, where you are defining whether you’re gonna spin up an EC2 cluster or if you’re gonna spin up a Fargate cluster, right? And set all your permissions and your security pieces and all that.

Shane: Okay. So that’s almost the orchestration of how you deploy and destroy those containers.

Is that right? That’s correct. And this is the thing, because I only work in the Google Cloud platform typically these days for our product. I’m trying to map your patterns, you know, so I’m going, oh, okay. So effectively we run cloud run and cloud build. That’s the equivalent of the containerization. We run Python when we have to.

Yep, that makes sense. We only use it when it’s 15 minutes or more because we have a whole lot of other patterns that we need. If we’re gonna go to anything that actually transforms faster end to end. And then for me it was like, oh, the elastic thing, right? I just naturally went to elastic search because that’s the last AWS thing I used that had the word elastic in it.

Again, I think one of the tricks with patterns is explaining and clarity what they are, and then actually explaining the context. Explaining the antipas is how I tend to find the easiest way to do it is we know if it looks like streaming. This pattern may not work. Yeah. Find another pattern. Otherwise, you’ve gotta go and actually do a whole lot of detail and describe it in words, and that’s actually really hard.

So how do you do that? How do you actually document a pattern that you’ve used so that you can remember it next time or so somebody else can use it? So

Chris: when I was at and t, for example, I would create these patterns and I would actually document all of it a couple different ways. One, I would write it all out just in a Word doc, save it on in a share.

We had a SharePoint site, this was 10 years ago. I don’t know, time gets away from me, and so save it in a SharePoint. But I would also create PowerPoint presentations of the slides of all the different steps so that people could see, okay, these are the resources that you need for this pattern. This is the description of the pattern.

Then step by step, mostly because people like the step by step ness that you get, then feel that you get with the presentation as opposed to just people glaze over when they’re reading a Word doc.

Shane: I suppose in that step by step means, especially if you put screenshots in, it gives them more context.

People can just go through and see what you are doing and that gives ’em more context or follow the steps and learn by doing, and that makes sense. And then. If I think about what we do, ’cause we have a similar set of patents, completely different technology stack. One of the things we do is we tend to deploy or land all our data in Google Cloud storage.

So equivalent of AWS three. And what’s Microsoft this week? It’s, yeah, Microsoft has Azure data lake storage, and then there’s gen two, right? And then there’s one lake. Seems to be the new fabric. This idea that we’re actually landing the data into file-based storage. And the reason we do that is because we have a pattern where we load from Google Cloud storage into BigQuery, and if we load from Google Cloud storage into BigQuery, it’s free.

We don’t pay any compute. Oh yeah. If we push the data directly to BigQuery, we pay for the compute processing for it to actually store that data. Now if we’re using Google Analytics, so GA four, there’s a out of the box adapter that goes GA four into BigQuery. So we bypass our GCS layer. Then if I think about out of the box adapters, this is where I’m going.

We’ll write a Python adapter effectively when it’s a system that’s not well served. So what we find is with every organization we work with, they’ll have three to four of the standard systems. They’ll have a HubSpot or a Salesforce, the ones we’ve seen before, and we can use an off the shelf adapter for that.

And then they will always have one system we’ve never seen before. There is no adapter. Only a hundred people in the world use it, and we actually have to write the framework to collect that.

Chris: I was gonna say, or even worse, they have a very custom product that they’ve created in house that you have to figure out the create your own adapter.

It’s A-B-Y-O-A bring your own adapter.

Shane: Yeah. And of course, it’s not documented. There’s no schemas. There’s sometimes no API. You’ve gotta go into either the database or the logs. Actually, I don’t mind that one so much. I find the one that does my nut the most is they’re using an off-the-shelf package.

Something like Salesforce or HubSpot. And then they’ve customized it so much that the off-the-shelf adapters don’t actually work. So now you’re actually custom building an adapter and there’s a bunch of companies out there that do nothing but build adapters. I’m thinking five tran data, DDO S3, those kind of companies that do nothing but that.

But let’s go back to that case. So let’s go and say we have a system that you’re gonna need to collect data from. It’s a well known one, so you can use one of the commercial software as a service products to do that collection. So effectively in my head, you are replacing the p and script part of that patent that you talked about.

Is that how you think about it? Do you actually plug and play to a degree when you decide that component of that pattern can be replaced with another one? And how do you decide which parts of your pattern survive? So which ones become almost pets and not cattle? ’cause that’s always a bit of a trick.

Chris: It is.

And I think to your point, if it’s a very. Standard implementation of something that has a plug and play pattern, a plug and play adapter in it where you could just pick up like a DF, right? And there’s an easy connector. It’s straightforward, then you could swap it out for those low code slash no code pipeline builds that you could create.

When I was in cybersecurity, there was some security involved and there was some lack of trust in the low-code, no-code tools. And so the Python was the way to go. And just because, again, we had a little bit more control when it came to security and, and we needed that from a government standpoint too.

And so those are the types of things that you have to take into account when you’re looking at which tool you’re gonna use. Like Informatica does such a great job with from a governance standpoint when it comes to that type of stuff, depending on how regulated your domain is.

Shane: Yeah,

Chris: did

Shane: until Salesforce bought it.

And then how did data podcast recording in 10 seconds For that, I come back to this idea of, if you think about a pattern being made up of a bunch of components, each one of those being a Lego block and you’re building these Lego towers that have fit for purpose. The context, again, of that end-to-end pattern, but also the context of each component is really important.

Because if I dropped into an organization and they already had the extract and load patterns in place, it was the one that you described and then somebody says, oh, just go hit Salesforce or HubSpot. I’m naturally gonna go and wanna look at a commercial adaptive because I go, I don’t wanna write that.

The frameworks are all out there. They’re robust, they’re relatively cheap compared to the human effort of writing one and maintaining it. And so I’d naturally go that way. But at the context of the organization is all data must stay within their boundary and they treat their boundary as a cloud boundary as well.

Then actually there’s no point I can’t bring that adapter in ’cause the context of how we’re choosing these patterns says that’s actually not the way we can do it. And that makes sense. So context is king as always. Absolutely. Absolutely. Yeah. Alright, so that’s extract and load. Hit me with another.

Chris: Okay, so another one is Databricks.

Right? So just to give you a glimpse of kind of what happens in Databricks for this specific pattern. In Databricks, you could choose between two types of compute. You have dedicated and you have serverless. And with the serverless you can’t just create a initialization script that loads all of your libraries that you need into the compute, into the environment, because.

It gets repu up every time something happens. You know, that’s the point of having that serverless where with the dedicated, you can preload all your libraries into that environment and you’re good to go unless you have to reboot that environment. But then you have the knit script that reruns everything.

So just a overview as to what’s going on here. So with the serverless, the pattern that I use because of this is I’ll create a bootstrap notebook that goes through and says, okay, if these are the libraries that. This notebook needs to install, then this is where it goes. Because if you have a wheel in place, it is easier and quicker.

And from a efficiency standpoint, a little bit better than having to go out to Pi PI to download and install that library. So I have a class and a set of functions that goes through and says, okay, this is the library that needs to be installed. It determines whether it is one of the ones that we have a wheel for, and those wheels are generally stored within in this specific pattern.

We’re talking Azure. But it could be, like you said, with the Lego blocks, could be easily replaced with AWS S3. But this one’s a DLS. And so it goes into a DLS, grabs the wheel, does the python dash m pip install of that wheel, and then moves on with life. If that wheel’s not available, then it’ll roll to the pipi version and download and install the pii, but it consistently does the same thing across that whole projects environment.

And it provides you with one central location for maintainability and for consistency across your full environment so that when you’re troubleshooting, you have one place to go.

Shane: Okay, and so this is where you’re getting technical, and I probably need my co-founder, Nigel, who’s the technical, one of the two of us, but really technical.

He is the engineering one. But let me explain this one back and see, because this would be a good test, right? If I understand the pattern, even though I don’t understand the technology. So what I think you’re doing is I think you are templating the initialization of the serverless component. So the serverless component is ephemeral.

You use it. It deploys itself. It destroys itself. So unlike a container where we say, go and then die with serverless, we effectively say, run. And it’s doing that for us. Is that right? Yes, exactly. When we deploy a container, the initialization of all the things we need are embedded within that container.

I’m old, so VMs were a thing when in my days. So we’re effectively deploying this container and the initialization and all the things we need are embedded in that container. So we don’t really have to care. It’s like container deploy, run, container, destroy. And we are in charge of that. That’s right.

That’s the patent for containers. Yep. Okay. And then with serverless, effectively they’re not serverless. ’cause there is a server and there is containers. We just don’t maintain it. We just go run this thing and it knows to. Hit that server, start up that container, run it kill itself. Yeah. So that’s the service pattern.

And then what you are saying is if we till it, when it boots up to then go and deploy all these scripts, it works, but it’s not optimal. Whereas if we effectively have a template and we say, whenever that service is invoked, go and use this template, we get more control. Is that what you’re saying?

Chris: So with Serverless, every time a new orchestration runs, it’s almost like you’re deploying a new empty container.

And with containers, with like Docker, you are telling within that Docker script, you’re telling it, okay, these are all my requirements. It provides this initialization script that runs throughout those requirements with Databricks serverless clusters. It doesn’t accept that there’s no ability to do that.

So when your orchestration notebook kicks off, if it’s not one of the pre-built in libraries that Databricks has loaded into its serverless environment, you have to install it. You have to do either do a PIP install with Databricks, there’s this magic key, right? That’s a percent symbol. And you could say, run, and then basically do a command line command there, and you could do a PIP install, but then it becomes inconsistent across your environment, right?

’cause every engineer that goes through and creates a notebook might do it slightly differently. They might not know that there’s a wheel or not. They might do a magic and the run and PIP install, or they might do a command line and say Python dash m, and then do it that way. So to provide consistency every time that those notebooks kick off, then we have this kind of bootstrap notebook that.

Does that and provides that consistency and the maintainability across our whole serverless environment. Does that make sense?

Shane: Yeah. So if I think about this idea of a data platform that has a serverless component, and it’s called a persistent component, it reminds me of one of the many iterations of Azure where we had synapse, serverless, and synapse, what was it?

Pull the old PDW stuff. Yeah. Yep. So that pattern of a serverless component and a persistent data component seems the same pattern in Azure with synapse as it does with Databricks. But I think what you’re telling me is if I was gonna deploy. This pattern, it works for Databricks. ’cause Databricks technically works slightly different.

Chris: Mm-hmm.

Shane: But would it, would I do the same thing in Azure Synapse, or does it actually allow me to solve that problem by just telling the serverless component to initiate these things?

Chris: The nice thing about Synapse is that synapse, when you’re deploying a pool, you could tell it and give it an initialization kind of script, right?

Of these are the Python libraries that I need loaded for this. Whether it’s serverless or dedicated, whichever the pool you decide. But I think the spark pools there are all serverless. You could predefine it, but Databricks doesn’t allow you to do that. You have to either do it in every notebook. That you create that’s gonna run or you create this kind of bootstrap script, that notebook that does that, that you basically are just depending on to your notebooks that you create for all your other ETL jobs.

Shane: And that’s key is that. The technical deployment of a patent may change depending on which technology you use. And there’s always something slightly different, which is why they have competitive vantage or not. So you’ll be really clear about that. When you pick up this patent and you’re putting it into a new place, you need to actually understand there may be something I need to technically tweak.

And also over time things change. So for example, we used to use cloud functions for our transformation code. So we would use a deploy and destroy on our transformation code as well. So effectively it would read our context engine, it would work out what it needs to do. It would write the code effectively or hydrate it ’cause it was standardized.

It would deploy it to a cloud function, it would run, it would then finish, and then we’d go onto the next one. So with daisy chain and these little globs of code, but the manifest of how they built were dynamic back then. Cloud function would only run, I can’t remember, for a minute or three minutes. So you get these ones where if you were waiting for it, it would just terminate.

We had to do a fire and forget pattern. Then cloud functions went to nine minutes, I think, and most of our small chunks of code would run within that. So then we had to re-engineer the code to just use it. And then we had to move from cloud function to a whole lot of other new things. ’cause one of the downsides of Google is they love to deprecate their products whenever they feel like it.

So the point there is maybe the core pattern itself isn’t changing, but the technical implementation of it needs to be iterated over time because data platforms and technologies are moving at such a pace that actually you always have to go back and review your technical patterns, don’t you? To make sure they’re still the most efficient way of doing it.

Chris: Absolutely. And from my perspective, I try to go through those patterns at least once every 12 months, if not every six months. Because to your point, things change so incredibly frequently, especially if you take fabric for example, right? Fabric 18 months ago is not what fabric is today by any stretch of the imagination.

I initially looked at fabric and was like, this is a marketing tool to try to push people into some new environment and try to raise their cloud bill. In fact, that’s 100% what it felt like when our Microsoft rep came and talked to us about it and I was like, how does this save us money? He said, operationally, we’ll save you money.

And I was like, but can you gimme a cost difference between what we’re paying now and what we’d pay on fabric? And he’s like, you’re not gonna see a decrease there. You’ll see a decrease in man hours. And and so I was like, okay.

Shane: Yeah, I remember I got asked to come into an architecture review for an organization that was deploying Databricks and somebody you obviously came into the organization and said.

It’s going Microsoft Fabric. It was quite a few years ago. And so you engaged with the Microsoft team actually. ’cause here in New Zealand, they’re a great team. They’re really helpful and said, yep, show it to me. And we had nothing but PowerPoint and you sit there going, ’cause I used to work for Oracle many years ago, so I’m like, I did a lot of PowerPoint demos back there.

Been told that the product was real. But I suppose my view back then was this is gonna go either one or two ways. This is just gonna be market architecture, right? They’re not actually gonna do any changes under the covers. We’re just gonna rebrand them and re cobble together all the quiet, disparate technologies that had happened by that time.

Or they’ll power BI it. Mm-hmm. And they will actually re-engineer it over time and they will make it cheap as chips and they will completely dominate the market where you’re a Microsoft customer. Why you wouldn’t use fabric. So I think we must have looked at it back then at the same time when it was just PowerPoint slides.

But you’re saying now it has changed. You’re saying that actually there’s some meat and potatoes under the PowerPoint now?

Chris: I, I think it, especially now, I will say that Microsoft right now is definitely coercing people towards fabric in many different ways, but it’s also grown a lot. Yes, absolutely. At one point, what was the phrase that you used?

Something about marketing.

Shane: Yeah.

Chris: Architecture. Yes. I love that. Okay. I might, I’m gonna steal that. At first it felt like architecture, right? It 100% was, let’s take a DF. Let’s take Power bi. Let’s take this one lake that we just started implementing with Power BI, and let’s put it all within. This one, basic, gooey, right?

And that’s really all it was. But now they allow you to spin up a SQL database, and that’s relatively new in the last three or four months. They have this new version of A DLS that is one lake, and that kind of helps and the interactivity between all of it, because it sits physically closer in their servers, then it runs a little bit faster.

So yes, from a orchestration and from a operational standpoint, it’s a little bit more efficient and the cost structure has significantly improved. When it first came out, the cost structure was. Frankly, in my opinion, a little bit ridiculous. The minimum was 10 grand a month or something like that, and it was nuts.

But now you could spin up like a F two and it’s much more reasonable in cost than it was 18 months, two years ago.

Shane: And one of the interesting things about that is if we think about technical patents, and actually Vos and Duke Learner got a book coming out this month and it, they talk about design patterns, which are the really big Lego blocks, the core meta patterns to a degree, and then solution patterns, how you can actually use them.

And then they’ve actually got technical patterns where they push code to get, to show you how it works. We can look at vendors and we can see patterns. So I’ll give you three examples. So Microsoft, what I saw them do with fabric was a couple of things. They started focusing on the control planes. I. As you said, they focused on how do we have one UI that can actually orchestrate or help us build and orchestrate everything?

How do we have one lake? How do we have one place where there is effectively a catalog, a Unity catalog that has a control plane where all pieces of data go through, even just from a context or metadata point of view? And that makes sense, right? That’s like getting the Foundationals in because really if we got to the stage where I could use that UI or those control planes, and it goes and determines which of the many technical components that’s spinning up, what’s the best, most cost effective and performant thing for the job that’s about to be done and I don’t have to care.

That’s a great pattern. And then the second thing that you talked about is because they were early, they need some early adopters, and they need people, they can get good feedback, and they don’t wanna open up to the entire market because the product’s not ready yet. So what they do is they put a price on it that only certain large customers can afford to stop everybody else using it because they know it’s not quite ready.

And I’m not taking a pot shot at Microsoft. That’s a great pattern to deploy and pivot your product. But here’s another pattern that we’ll see in the next week. Databricks and Snowflake have gotten to a habit that they did what used to happen when I was at Oracle. They will announce features that don’t exist now.

Mm-hmm. Yeah. At both their summits, they will announce features that may turn up in 12 months and then they have a pattern. Yeah. It’s ah, they all use different words. Right. But there’s the internal one and then there’s, there are small bunch of really trusted users, and then there’s the slightly broader, but you still have to be invited.

And then there’s the early adopters. It’s open but don’t trust it. And then there’s the, this thing’s scalable. It’s been tested by their customers a lot, and that’s normally a 12 month cycle. So when we see these lovely architecture announcements right now, while I get grumpy with them, I look for the patterns of, okay, where in their deployment cycle is that piece of code?

How does it fit with their strategy? And then the last one, ’cause it’s topical right now, is Fusion from DBT.

Chris: Mm-hmm.

Shane: Yeah. Oh, I remember when I was, again, Oracle, I’m sure we had a product called Fusion, and then it got called confusion, which I see a lot of DBT competitors now starting to use. But you think about their problem, right?

They have open source product. They’re not making money on that. Everybody uses for free that everybody expects them to keep developing and giving them features for free. They are now a commercial company that has VC funding and actually has to get growth or profitability. Right? That’s the two things they have to do.

Yeah. I don’t see why the market’s surprised that we are now merging those two things and making them more and more paid. That is a common pattern. Yeah. I mean it could be worse. It could be Elastic search, right? It could have been AWS picking it up, calling it an AW WS product and not actually letting DBT get any of the revenue.

I’m not sure that’s a better pattern.

Chris: It might be Salesforce’s next acquisition.

Shane: What? DBT? Ah, no. They need a database, right? My They do. I was thinking about it the other day. Teradata, Salesforce has to buy Teradata.

Chris: Yeah. Because that’s the next tool that we don’t really love using, right? Is they went through and they, Salesforce bought Tableau, which is the last generation of BI tools, like everybody uses Looker or Power BI or all these other visualization tools, and they’re like, let’s buy Tableau for a huge amount of money.

And then they’re like, okay, let’s buy now Slack for a blue billion dollars. They bought Slack, they bought Tableau, and then now they’re gonna buy Informatica. And these are the stepchildren of the tools that we love using. And so maybe, like you said, Teradata feels like the database of choice for our stepchildren of databases that that people, well,

Shane: especially ’cause Teradata actually re-platformed their on-prem database to be cloudy and they brought their price point down.

They just missed the market by a couple years. So if we look at patterns, let’s look at the Lego blocks of an end-to-end platform communication. Slack visualization, Tableau, they’ve got their AI thing, right? They’ve now got the transformation and the data acquisition. They’ve got MuleSoft, they’ve got the streaming version of that.

They just now need storage and then they become. End to end. Yeah. It’s a pattern

Chris: they become the oracle of, of today. Right? They,

Shane: yeah. Although I do remember my theory back then when Oracle was buying everybody was, they bought their competitors because they couldn’t wanna get some. The reason I moved into data was I started out in ERP and financials, and we used to get our asses kicked by PeopleSoft all the time.

Chris: Oh yeah.

Shane: And yeah, like just every deal, it was like our product was ugly and their product was beautiful. It was like, it was as easy as that. And then my view was Oracle bought PeopleSoft to kill it. It was, we can’t beat you, so we’ll buy you. And that was actually a, uh,

Chris: hostile takeover.

Shane: Hostile takeover, yeah.

Poison pills and cool names like that. Anyway, let’s go back to, uh, technical patents. I go, oh, before we do that, actually, what the hell is a wheel?

Chris: A wheel? Okay, that’s a good point. So there are a couple different ways that you could install libraries, Python libraries. One is you could go out and basically you’re gonna phone a friend, right?

You’re gonna call Pi Pi, you’re gonna call this pi, which is this big repository of all these Python libraries and say, Hey, this is the library I want. I’m gonna download it and I’m gonna install it. Or you could create a wheel. And a wheel is basically a local file of the library that you could save them locally on their computers.

They, you could save them in a an S3 bucket or A-A-D-L-S container. And then that way you have a static install of that library. I think it’s probably the easiest way to explain it.

Shane: Okay. So it’s like, in my head it’s what we call a manifest, so mm-hmm. We effectively go and grab all the transformation code we need for a job.

We then basically build out that Python or SQL step series of steps. We then keep it as a manifest, and then we run that manifest and then we destroy it. But what you’re saying is it’s basically a manifest, it’s a list of instructions that are stored somewhere that you can make it reusable and it’s So is wheel a databricks term?

Chris: It, yeah. No wheel is, wheel is just a python. Like it’s a, it’s like an executable for Python libraries, right? It, it holds all of the instructions for that specific Python library that you’re wanting to install. And the advantage of having a wheel that you use to load every time is that one, you’re not having to call and download.

File from pi, right? So it’s a little bit quicker, more efficient, but also you have a consistent version of that library. So every time it, if you call pi, it’s gonna give you the latest one every time, which can cause issues in your ETL if there was some major change in the version, in the updated version, right?

And so you download and use these wheels to do your installs, and that way you wait until you’re ready to install the latest version and you’ve done any type of regression testing and made sure that you know everything’s gonna run correctly on the new version. Then you could download the new wheel and then save your wheel to your blob container, wherever it may be.

And then. Reuse that,

Shane: and that again, it’s a pattern you’ve just described, a small pattern that solves a problem with a context. And one of the things about patterns, even we’re talking design solution or technical patterns, the language changes and it’s all about context. So the example I use regularly from that book about buildings is if you have a lounge, a sitting room, and you’re in a sunny place, you’re typically gonna have big windows.

If you have a bathroom, you are typically gonna have a small window, and it probably is gonna be a paque. I’ve got a friend that lives in Iceland. Their windows are very thick because of the cold, so their design of their houses are very different. Got another friend that lives in one of the most beautiful spots in the world on effectively a hilltop, and nobody overlooks them.

So their bathroom actually has a shit ton of windows and they’re not opaque because, oh wow, they want light in their bathroom and nobody is around to actually see in, in theory. So if you think about it again, that context is key. And that example you gave of a wheel, when does versioning and making a static copy of those libraries have value versus when do you actually wanna reach out and grab the latest version?

It just context and choices in every pattern is important to do that, and language is important. Excellent.

Chris: You’ve got another one. I do, I have a third one and this one is more Azure based, so it’s more that kinda low code, no code. ’cause we’re gonna use a DF and in Azure, one of my. Patterns that I use for, or even orchestrating in this case, pipelines in Azure Data Factory and even Bricks Libraries or Databricks Notebooks, is you can actually orchestrate all that with an A DF pipeline, which is really cool, which is nice.

You don’t have to go out and do an airflow thing, so if you’re in an environment where you don’t have access to airflow, then A DF provides you availability of this pattern. And so basically you’re creating a pipeline where you are dropping in these tasks into your pipeline that are run notebook. You could run it, like I said, you’d run Databricks notebooks, so you’d run Databricks Notebook and then you could drop another task in that’s Run Pipeline and you define which pipeline it’s running.

If it succeeds, then it moves on to the next. Pipeline that it needs to run within that orchestration. And then if it fails, then you could, then you’re going to, and this is one of those patterns within a pattern where if it fails, then you’re gonna go through this alerting process. You’re gonna send an alert.

Generally, I like to use teams, people that are in that Microsoft environment because teams, people get on their phone, they get, whether when they’re in front of their desks, email, they may or may not get, but teams message, whoever’s on call, they could make sure that they’ve got those alerts on and they are getting those alerts that says, Hey, I.

This specific pipeline failed within the orchestration. We need you to go and check it right now and super high level, that’s what that pattern looks like. Then at the end of it, after each pipeline, it actually goes through and sends the metadata for that pipeline to a table to say, okay, this pipeline completed successfully or didn’t, if it did fail.

And then we have the metadata tracking as well that goes along with it. So there’s a lot of configurations there, and probably I’ve just named three or four different patterns within a pattern, so I apologize.

Shane: Oh, you definitely have in then. And then I’m assuming that actually the log from every run goes to some kind of file based or log based storage.

So you can always go back and forensically see what SQL was run or what code was run, and then you probably aren’t versioning the data, you aren’t keeping copies of before and after data. Yes, correct. And yeah, so that, again, that orchestration pattern, I can look at that and go, yeah, directed graph. Yeah, that is the orchestration, flow alerting, pushing that out to different channels effectively, depending on the urgency of the problem.

Metadata or or storage of all the run stats. ’cause you’re gonna need those, I’m assuming you’re gonna have runtime number of records loaded. You’re gonna have all the stuff that you’re gonna use later, pushing the actual logs from the run somewhere else that you’re just gonna keep as cold in case you have to go in and forensically figure out what the hell happened.

You’re probably gonna generate a run id. So you can see that this set of code ran it this particular time, and then you’re indexing it. If I think about what we do, we do a lot of those patterns, but we do one thing really differently. We don’t use a directed graph. So we don’t go in and actually hard code the flow.

What we do is we dynamically generate it. And that’s a decision we made right at the beginning. Right now, that is a completely different pattern, but everything else we push to Slack. We don’t push to teams. We log everything like you do. We have a metadata or a context, a repository where we can see every run and every ID and every table that was loaded.

We get performance stats, so we go, that one’s taking too long or costing too much. All those patterns for orchestration. Out of interest on that, just again, back going to technology, EF has been around Azure Data Factory has been around for a long time. It’s a fairly mature product and it’s a mature product that’s survived many architectures, but many actual architecture changes within Microsoft.

So it is a core component. I would have a bet that it’s one of the components that won’t disappear, and yet I see lots of people doing Azure deployments using airflow. Now, in the old day, A DFI think used to be a full fact kind of client, but now it’s serverless, isn’t it? It’s more of a serverless type of behavior.

So we get the benefits of that. So why would somebody deploy a technical airflow pattern for their orchestration in Azure compared to just using a DF?

Chris: There’s a couple reasons. One, I think that people don’t see a DF as an orchestration tool. I think that airflow is the big name out there. I think A-C-D-O-C-T-O gets a phone call and is, Hey, let’s deploy airflow.

And they’re like, oh yeah, that’s the thing. We’re gonna use airflow without realizing that. I think so many people just schedule and forget a DF pipelines and don’t realize that they could actually do a lot of the orchestration within a DF for one and two there. There’s definitely some additional complexity that can be involved that airflow lends itself to.

I’ve run into companies where they don’t do everything within their Azure environment, right? Where they have different pieces that are running different code in different places, and. Airflow does a great job of bringing that all together and allows you to orchestrate it from one place where a DF, maybe you’re calling an API to get something to trigger somewhere.

Or maybe you are having to creatively figure out a way to kick something off in some other system somewhere that doesn’t have a native correct connector within a DF. So that conversation that we had earlier about native connectors and whether you go custom or you go use the native correct connector, right, where Airflow has a lot of those connectivity pieces built in already.

The other thing is that. A DF has come a long way, I think, especially like a couple years. Partially, I think because Microsoft knew that they were gonna push it into fabric. They needed to make it a little bit more robust. They needed to get it to a place where they felt like, okay, this is a great tool for fabric.

And to your point of it’s not gonna die anytime soon. Just like SSIS packages are not gonna die anytime soon because I’m shocked that they haven’t, but they haven’t. There’s that piece. I also think that, I’ve run into a lot of companies that they picked up airflow because they’re like, we’ve got Databricks, we’ve got Azure, we’ve actually got this small AWS account over here in this corner somewhere, and we needed to orchestrate all of it.

And that’s easier done with airflow than it is with a DF.

Shane: Does Databricks have an orchestrator?

Chris: A native

Shane: one.

Chris: Databricks has, I will say that in one of the companies I’m doing consulting with, we’ve written a custom notebook that does the orchestration. And so it’s not a native orchestrator like you would see it in airflow or even five Tran or a DF.

It’s all written with Python and we’re kicking off child notebooks within the Orchestrator notebook that we’re calling it. And then we’re running those either in sequence or we’re running them in parallel because we have that great kind of concurrency and parallelism that we could use within Spark.

But it’s very custom and it’s very created. Right. It’s a custom workaround.

Shane: Alright, and then with that master notebook, the controller of the controllers, are you hard coding the things it’s calling? In that notebook, or is it dynamically looking it up and generating that manifest from somewhere else?

Chris: So it’s dynamically looking it up based on a couple of things.

We have a table that says, okay, these are the notebooks that we, that need to be run. And then that way it’s not a full SDLC, right? It’s not a full lifecycle. Every time something has to change, we’re just adding it to this table that’s sitting in a synapse SQL database. Then it can dynamically. Just change.

If we have a new notebook, then it’s a little bit less overhead and we’re just dumping it in there.

Shane: Yeah. The reason I ask is there’s this mega pattern. I haven’t got a name for it. I got one now and I’ve gone through many names and I’ve hated every one of them. None of them stuck. So I used to talk about config driven.

We had a table, uh, who config. Everything was driven off that config. So that table you hold of all the things that should run, you’re gonna have some tagging in there. You’re gonna have a whole lot of attributes, context, and your mega script. Mega notebooks gonna go through and say, Hey, I’ve been told to run this.

What are the dependencies? Effectively, it’s gonna look up that config table that didn’t stick. And then I looked at active metadata. That was a term in the market and that didn’t stick. Then I tried semantic engine and that didn’t stick. I just saw vault speed said, ’cause they effectively use context driven ETL, and they called it model driven or something.

And I was like, yeah, I hate that. Actually Kwan Cicada posted something around semantic layers and ontologies as he does. And then I had a comment in Chris tab Trumpton as well. And right now I’m stuck on this idea of a context layer ’cause effect. That’s what you’re doing. You’ve created this sequel table of context.

And we have context orchestration, we have context for ETL. We have context for the business glossary. It’s all context we use. So I’m playing around with that right? Context layer is this mega patent. So that’s why I was asking ’cause I was like, I could just see this idea of store the context, use the context to dynamically generate the thing that needs to run rather than hard coding it into that notebook.

Chris: And I’m a big fan of that specific pattern where you are using table driven architecture to do things for a lot of reasons. I think it’s a great pattern for cutting down on new projects that are initiated just because one small piece has changed. I’m a big fan of doing that with business logic, having business logic that is table driven because business logic changes so incredibly frequently.

And you have a table, this is where you’re putting your calculations, where you’re putting all your relationships and everything and you’re driving all that business logic from. This table driven architecture as opposed to hard coding. Hard coding makes me wanna pick up my trash can and then puke in it.

Shane: Yeah. And again, that pattern of hard coating’s hidden. Mm-hmm. So I remember we, we were back in Oracle warehouse builder where we used to hard code the graph, the dag, we would say This no goes in this, no. Alteryx came out and we are hard coding things coalesce comes out and I’m looking at it going, we’re hard coding it.

It may be pretty and it may look like you’re fast, but you are hard coding that thing and it’s not dynamic. But then I remember when we went from Oracle warehouse builder to Oracle data integrator, when they bought. Oh, that French company, and that was object orientated. You just created these one-off objects and then it built everything dynamically and nobody understood it because it was just too hard.

I’m assuming the two people that wrote the product that got it. But what I saw was every consultant we had would just basically hard code a DI packages to make them look like W warehouse builder tags that data flow. So I think this idea of context driven, we have context about business logic, we have context about orchestration.

If we store that as. Physical data and then use that to generate or hydrate what we need. It’s a much better pattern.

Chris: It’s so much more maintainable than hard coding it, especially when it comes to something that you inherited, that somebody else inherited, that somebody else inherited. Reading through all that hardcoded logic becomes unbearable sometimes.

Shane: Yeah, and then actually we still need a map. So while I’m bagging on data flows and these directed graphs and these nodes and links is not the ideal way of configuring what we do. Surfacing it as a picture really helps giving person a map where they can say, show me the nodes, and links the steps. And that orchestration or that steps and the ETL that has value, but it gets hydrated from the core context.

It’s not the actual context itself. And so if we think about that. You talked about your 25 years in the data space, there’s a bunch of people I can talk to that talk patterns, whether they know it or not. And when you talk about this is what a pattern is, think of as Lego blocks. They just, that’s how they think.

And then if you think about what we’ve done today, we’ve gone through some really high ones, down to infinite detail. We’ve said, Hey, if you’ve got orchestration now you’ve gotta worry about the pattern of where you store your notifications, which channels you’re pushing to, what’s orchestrating, how do you build your context engine so that it’s dynamic orchestration, not hard coded.

Ah. Now we need to log every log. How do we pass those? How do we get alerts? How do we hints on what failed? How do we schedule it? How do we daisy chain up? There’s all these patterns on patterns. And if you come into the data world and you haven’t done all those, it must be so overwhelming because where do you start?

Oh, I’m gonna choose between Azure data factory and airflow. So you come in there. And every decision has an impact. ’cause if I pick Azure Data Factory over airflow, I’m now already made some trade-offs because Azure Data Factory means I can’t, in theory, move my platform to AWS or Google Cloud, which I, in theory can do with airflow.

I still go bullshit on that because the cost of change is so high. But in theory, I can, or as you said, if I need to punch out to a whole lot of systems outside of Azure, you know that you are pretty much gonna go airflow. But you have to know that’s the lever, that’s the trade off decision. And so we don’t document these patterns, we don’t share these patterns.

We share reference architectures for a vendor, but we don’t share them for patterns. So it must be hard if you come in without all the experience to actually work all this out.

Chris: Yeah. Yeah. I, and I think for new people it’s already overwhelming because of all those different tools that are out there, right?

Especially from a data engineering standpoint, the vast, um, number of different tools and platforms and languages because maybe 80% of the market out there is using Python, but there’s another 20% out there that’s using Java from a data engineering standpoint. And lemme tell you, that is not an easy switch to go from Python to Java, both our OOP, which for those of you that may not know object oriented programming, but Java, you need a higher kind of technical aptitude for it than you need for Python.

SQL to Python to Java is probably a good building block, but you can’t jump straight into Java unless maybe you didn’t know anything to begin with. And there’s so many tools and it’s one of those things where you don’t know what you don’t know. So much out there. At and TI did have a kind of a repository of patterns that we would use for different situations and a lot of that was driven by what was whitelisted and what wasn’t.

So what was allowed by the company and what wasn’t. And I think even then, because there were like 80 different patterns in there, even that could become overwhelming for somebody that’s for a junior or a fresher that’s stepping in to that role. And they open up that repository and they’re like, oh my gosh, I need to learn all this like yesterday.

Shane: And I didn’t see the consequences. I remember I came into Rescue Project in New Zealand and they had, they’d gone map r. So it was back in the Hadoop days where we had a choice of native Hadoop, Cloudera, and Mapa. There was one more, wasn’t there? Oh, I can’t remember. And at the time, the organization wanted bypass all their, uh, procurement process.

So they got one of the shiny suits, large consulting companies to come in and do a quick market scan for them, which is how they used to bypass the procurement process and government back then. And funny enough, that consulting company picked MapR, which they happened to be the only implementation partner in New Zealand.

Fancy that. And when we came in, everything was being hard coded, which is a problem. We wanna move it to this context, config pattern, table driven. And so we needed to do some development and we’re like, okay, what’s the choice? And the choices back then were Python or Scarla.

Chris: Mm-hmm.

Shane: And it’s okay, how do we make a decision?

And we said the Mapar team are all scarlet experts. Yeah. So that’s what we should go with. ’cause we really didn’t care at the time and. The unintended consequences of that was massive because there was no SCADA skills in New Zealand. Mm. Everybody was ever doing R or Python. And what that meant was we could only use the MAP R team.

And actually the MAP R team didn’t quite have as much SCADA skills as they said they had to pass it back to the US and we ended up with an offshore, like we always knew it was gonna be offshore because we knew the Mapar team was supposedly domiciled in Australia, but actually their core skills were actually in the US and we had this massive time zone difference and that one decision Python over scaler without bringing in the context of can we buy skills for that in New Zealand if we had to.

It had so many unintended consequences that caused major problems with that project. So again, it’s hard, I think actually in the new Gen AI, LLM world. I think we have the ability now to potentially document these patterns. And especially you talked about ages ago, this idea of step by step with screenshots.

If we do light context of what the pattern is in some kind of structured format, some kind of templated way, and then we do the step by step and we put that into an LLM as context. I think we have the ability to give people who aren’t as experienced to ask a question and get something back that may help them get through that whole noise, get the signal from the noise.

’cause like you said, as soon as we take a mega pattern and we break it down and then we bring in all the technical implications of every different flavor of technology, we can do that. The challenge is nobody’s gonna write them like who’s gonna write that repository for free?

Chris: Yeah. Yeah. No way. Because it’s a massive undertaking.

To your point, I think that’s what people try to do with Terraform too, right? Is they try to productionalize all these little different pieces and then like at and t, we had a system where we checked a bunch of boxes for our project and then hit the button at the end of it and it would go through and grab all the Terraform scripts that it needed in order to create and spin up all the resources that we needed for that kinda larger project that we’re working on.

But similar kind of idea, right? We have an LLM that has, in memory, that’s been trained on all these different patterns that we have out there. We allow people, especially Freshers, that are learning, right? To go through and just. Really just talk to it. If you’re using like a chat GBT or something, you could literally just explain your project verbally to chat GBT or whatever, and then it spits out, okay, these are the patterns, and gives you the PowerPoint presentations or whatever the case may be, and shows you, okay, this is what I’m gonna be doing.

And then actually even just does it right. Give it some configuration pieces and allow it to write a lot of it for you.

Shane: And then you have to go

Chris: through and come through it. But

Shane: yeah, and then you have to go, what’s the consequences of it getting it wrong? And do we get it right ourselves? And as I said, I’m not technical.

Nigel does all the engineering stuff, but I have a bunch of patterns for working with teams. I can drop into a team and I can observe for a while and go, okay, that’s probably where their biggest gap is right now. And then I’ll bring some ways of working patterns. But even then, I forget half the patterns I’ve done before.

I’m like, oh, I’m sure I had something that kind of helped the team solve that problem, and then I’ve gotta go search my repository. Do you find that if you talk about the at t stuff, the patents that you had, I’m assuming they’re now either in your head or you’ve documented it lightly. So when you go and help a customer, you’re Ah, okay.

There’s a problem, I have a pattern for that. I’m gonna grab that. It’s 80% of what I need, and then I’ll tweak it for their context. Yes.

Chris: And that’s exactly it. And when I left at and TI, I try to go through and write down, okay, these are all my patterns that I use pretty frequently. And then you tweak ’em because otherwise it becomes this tribal knowledge.

And you get that with a lot of companies too, where you have these patterns that have been used for decades and there’s that one person that knows what that pattern looks like, and then they get hit by a bus tomorrow and that pattern’s gone forever. So important I think, to document these things and write them down and actually physically go through and take the time to document it and write it down.

Otherwise you don’t want it just stuck in your head or even in a process somewhere. You want it. You wanna document those things.

Shane: But do you find that even though you’ve documented them, likely you still struggle to find the pattern sometimes yourself, that you know you’ve documented? Sometimes.

Chris: I think this is the pattern that all programmers fall into is sometimes we’re really fantastic at organizing things, and sometimes we’re doing it on the run and it gets dropped into our temp folder somewhere, and we’ve got those 300 files sitting in that temp folder and who knows where they ended up.

Shane: And in your consulting business, have you brought on five new consultants to work with you? There’s no way that you can actually efficiently share those patterns with them on day one. It’s gotta be a incremental thing, which is, oh, it looks like we’re struggling with that. Here’s something that we’ve done before that seemed to fix it.

You iterate it, you pick it up and change it, make it better and push it back. But I don’t see a lot of the, that’s an overhead. And so the downside of that as consultants as well, is that makes you more efficient. And if you are paid by the hour, it actually costs you money to document it and then you get paid less for implementing it.

But if you go to a value based or you become fractional, then those patterns are key for that ability to deploy something quicker and faster and easier and still get paid for the value of it is where we should be aiming for people that are experienced and have that kind of patent repository.

Chris: To that point, like I said, I try to organize things and then some, and I do a lot of fractional work, right?

And a lot of my consulting is fractional, and so I have some old patterns that I could reuse, and to the point of bringing on a subcontractor or somebody, then I could hand somebody, okay, this is my Azure folder, and so here are my Azure patterns, or here are my Databricks patterns. But then I have those stragglers.

To your point of that, they didn’t make it into the folder because they’re somewhere else on my external drive somewhere and, and then I’m like, where did I put that? I know I’ve done it 50 times and yeah. Yeah.

Shane: Yeah. And that’s why a lot of engineers just write from scratch. ’cause it’s actually easier to write the pattern from their head.

Mm-hmm. The way they’ve done it 10 times before than it is trying to search and find. And that’s when it’s your code, it’s when it’s your pattern. Again, when you’re trying to share somebody else’s, that knowledge of the pattern and the context and the antipas is really hard. So I think one of the things, if you’re an organization and you’re looking to hire somebody and that person is saying they’re experienced, then you should actually just ask them to describe three patents.

If they can’t do that, then they’re even not as experienced as they said. Alright, just a fabric. You reckon it’s, it’s there. Now, would you recommend to a customer, if they’re a Microsoft shop, they should include fabric along with Databricks and Snowflake to make the triad that they investigate or not?

Chris: I think it’s starting to be, I think it’s finally gotten to the point where.

It could be. I think it highly depends on what your patterns are to the point of our whole conversation here. It definitely depends on what your patterns are that you’re actively using in your business. And I mentioned earlier that I. Microsoft is really pushing people into fabric. They discontinued the big Power BI license that allows people to have this Power BI environment, and they’re pushing people into fabric.

You have to do the fabric option now. They forward. Data engineers, they discontinued the DP 2 0 3, the Azure data Engineer certificate, and you have to go to the DP seven, DP 700, which is the Microsoft Fabric Data Engineer certificate. And so Azure has come through and taken the stance of really pushing people in that direction.

On the plus side, they’ve also made it so much more, more robust than it was when you and I probably initially looked at it a few years back, and so there’s definitely some value to having everything all in one place, especially if you’re already using a DF anyways, then you could migrate a lot of that stuff over.

It’s not a push and play, which would be my feedback back to Microsoft, is allow people to be able to just migrate their whole A DF environment into fabric. That would be the optimal solution there in my perspective.

Shane: I think, again, if we look at patterns, there is a pattern where consulting companies make a large amount of money migrating from one to the other.

And that helps the software company because having partners recommend your product is one of the channels. And I think that those core architectural patterns are important. So my understanding is if you’re gonna deploy Databricks, you would deploy Unity Catalog. Yeah, the core. It’s not optional anymore, it’s just a core component.

And then if you’re gonna deploy Snowflake, I would suggest you would start planning how you can decommission all the third party products. Because the pattern I see them doing as an organization is starting to become end-to-end themselves. They’re bringing, just like Salesforce is buying Snowflake, you’re bringing it in.

So if you rely on a third party component, you’ll start seeing pressure to start using the Snowflake equivalent over time because. Their lifecycle as a company, that’s where they’re going. That’s a patent we’ve seen before and we shouldn’t be surprised when we see it again.

Chris: I know of a company in Finland, that’s the direction that they’ve picked up.

I have one of my mentees is a senior data engineer for a company out there and they have Snowflake and they’re doing everything within Snowflake and they’re writing Python scripts. They’re orchestrating those Python strip within Snowflake, scheduling them in Snowflake and trying to do everything within there.

But yeah, absolutely.

Shane: If you’re buying DBT or using DBT, you shouldn’t be surprised when more and more features go behind the paywall because that’s the life stage, that’s the pattern of where that company’s at. Excellent. Alright, so if people wanna get a hold of you, where do they find you? Do they find any writing?

How can they get in touch and see what you’re doing?

Chris: Absolutely. So I have a YouTube channel. It’s the Data engineering channel. So you could find me there as well as my website, which is GA Data Engineering. My name is spelled a little bit weird, so it’s G-A-M-B-I-L-L, data engineering.com or or just gamble data.com.

Both redirect to the same site and those are the big places. And then obviously on LinkedIn you could find me, Christopher Gamble on LinkedIn. There’s a lot of Christopher Gambles out there. So you’ll see the one that’s connected to Gamble data.

Shane: I think that’s the other pattern. If you want to try and get into the space and make a name for yourself, change your name ’cause uh.

Shane Sson, the guitarist. He’s more famous than me. And then Shane Gson, the sales trainer who’s way more famous than me and it’s, damn, I need to change my name. But there was a guy in New Zealand changed his name to Mark Rocket ’cause he’s just been in, in in, what’s the rocket that looks like a penis? The Amazon guy.

Bezos Rocket. Yeah. Oh yeah. He changed his last name to Rocket. So there you go. That’s like dedication. That’s awesome. Excellent. Hey look, thank you for coming on the show and describing some of those patterns. That’s been pretty cool.

Chris: Absolutely. Thank you. Thank you very much for having me.

Shane: I hope everybody has a simply magical day.

Data Engineering Patterns with Chris Gambill

Guests

Resources

Listen on your favourite Podcast Platform

Podcast Transcript

Read along you will

Fractional Data Team

Common Team Problems

About