Data Contracts with Andrew Jones

Sep 5, 2024 | AgileData Podcast, Podcast

Guests

Resources

Join Shane Gibson as he chats with Andrew Jones on the pattern of Data Contracts

Listen on your favourite Podcast Platform

Podcast Transcript

Read along you will

Shane,: Welcome to the Agile Data Podcast. I’m Shane Gibson.

Andrew: And I’m Andrew Jones.

Shane,: Hey Andrew, thank you for coming on the show. The reason I’ve asked you to come on and graciously accept it is to talk about this thing called data contracts. Beginning this year I was trying to guess What terms in 2024 are going to end up being buzzwords?

And what terms I think are going to survive to 2025 because I think they actually have value. And data contracts is one of the few in my head that I said, I think they’ve got legs, I think they’re valuable, I think it’s a pack we should implement, and I think it’s something that will survive to 2025 and 2026.

But before we rip into what is a data contract and why should we care? Why don’t you give the audience a bit of a background about yourself and where you started and where you’ve come from.

Andrew: Yeah, sure. So I’ve been in the industry for around 20 years now. I started off more as a software engineer and then moved towards building platforms, like engineering platforms, infrastructure platforms, and then after that into data platforms, which is what I’ve been doing for the last 10 years or so.

And it’s been around six years since I started thinking about data contracts and using VAT to build data platforms.

Shane,: Excellent. And it’s rumored that you’re the person that invented the term data contracts.

Andrew: Yeah, so yeah, it’s around six years when I started thinking about what became Data Contracts. It’s the same sort of time that Jamek released the Datamesh article, it’s quite a good inspiration as well. But I kind of didn’t want to call it Datamesh internally because that felt like a big thing , I was just leading a data platform team, I wasn’t ready to change the organisation in that sense.

So I started thinking about, and we’ll go into what Data Contracts is later, I started thinking about what became Data Contracts, I started needing a name to give it internally. I was thinking it’s a bit like an API. An API is like a contract between a provider and a consumer.

Maybe this is the same thing for data, might call it a data contract. And that’s kind of supposed to be a temporary name, but kind of stuck internally. And as I started to think about it publicly, I started googling it, seeing if anyone else used the same sort of term, or used similar terms for so many things.

But at the time I was just Googling data contract I was getting, you know, get a 4G mobile data contract, , I couldn’t find any other literature around it. Although I could see people doing similar things, but there wasn’t really a name for it. So then I started using the word data contract in my public writing as well, and that sort of has stuck now for the last few years and became Yeah, it became a bit of a pattern, a bit of a category now in data.

Yes, that’s

Shane,: You’ve implemented data contracts in the company you work for and you’ve written a book about it. So it’d be fair to say that you have quite a bit of experience in this thing that is a data contract,

Andrew: We’ve been using it internally for about six years in production of the company I work at. Speaking about it publicly for around four years, and the book came out last year. we’ve been doing it for a long time and we’ve been doing it very successfully where I work and also with other organisations I work with now, now as well.

Shane,: Let’s start with the basics. How would you describe what a data contract is?

Andrew: So, a data contract, at its most simple, is a very simple idea. It’s something that describes your data. So it holds your metadata. And that’s all it is, so it’s quite a simple idea really. But it’s the way you use that. what makes it powerful, where you use it to do whatever you like really, whatever problem you’re trying to solve for your organisation.

So the problem I was trying to solve initially was we had upstream changes, breaking downstream data pipelines and data applications. And that happened too often, and we were trying to use our data for more and more important things. Things that drive revenue, things that are key processes, shouldn’t be breaking every couple of days,

That was the problem I was trying to solve initially. And what I realised is that we were basically building on top of the upstream databases. We had to change that capture service, chucked it into a data warehouse, and it’s building on top of that. We’re essentially building on top of a database. So when that changed, and it changed quite often, as our software engineers developed features, it just broke things downstream.

So that’s the main problem I was trying to solve. So I thought, well, again, I spoke about I’ve got a software engineering background. If I was a software engineer, I wouldn’t build on top of someone else’s database. I wouldn’t let me do that. We’d always have an API. So I started thinking, I want the same thing for data, because data is as important as our software services.

Why don’t we have the same thing with data? Why not? So it’s time to get towards the solution there of like what became DateContracts. But I need something to describe this interface, something that that I could then use to provision an interface and that became the initial DateContract.

So essentially it’s just a schema and a few other things in there that we then use to, Create the interfaces to move away from data you capture, and so that’s a problem we tried to solve But it’s a very simple idea that contracts, if you’ve got other problems you’re trying to solve, they can help by They can solve by describing your data better.

For example, maybe you’re trying to Do access controls, automate that based on roles, and you want to categorize the data. What’s personal, what’s not, what’s good, what’s bad Confidential, or secret, or public. We do that as well in our date contract, and there’s nothing written there. So it really depends on what problem you’re trying to solve for your organisation, but the most common problem I’ll say is around the change management of upstream data, and that is where we started with date contracts as well.

Shane,: So I can see a lot of patterns in a data contract as you describe it. So the obvious one is a series of tests, definition of a schema does this data match the schema that we’ve agreed? If you were doing a minimum viable data contract, would you start with schema definition as the first thing you start with?

Andrew: Yeah, schema definition, It’s kind of something you need in any data contract, I think. So think about what needs to be in all data contracts. You probably have a schema definition, you probably have an owner, the kind of basics there, probably every data contract has that. And then you can use that schema definition to do simple checks like Does this data match schema?

Is it a string or a number? Maybe do more complex checks like does it match a regular expression or is it an email address? Whatever you want to do there. But you start off just have schema and types is the bare basic of a date contract.

Shane,: And then the bit that. What kind of intrigued me was to go from being a series of data quality tests or schema validation tests to being a contract is invoking this idea of an agreement. If I think about it, , I can have a data catalog, I can have metadata management, I can have schema definitions, I can have data quality tests, I can have schema enforcement tests on my side.

But that’s not a contract because it’s a one sided thing, there is no agreement from the person sending me that stuff that they’re actually going to bind themselves to this contract. And for me that was the slight difference in this pattern was, yes, there’s lots of tools and techniques that we’ve had for years.

But actually , the push is to then have an agreement with the data provider, the data producer as a true contract, a true agreement that we both abide by is that how you see it? Is that the thing that actually makes it different?

Andrew: Yeah, exactly that. the kind of things you described there and a bit what we’ve been talking about so far as well has been the return to text side. So you can have a date contract, you can do tests, you can do schema validation, you can do CR checks, you can do all those kind of things. That’s kind of the text side.

But on its own, that doesn’t solve the problems I want to solve. So I think it needs to be the people side as well. And that’s where you start thinking about things like agreements and collaboration. The day contract is a great place to have that collaboration, to talk around, to have something you agree on, to define who owns it, define who’s accountable and responsible for meeting that day contract.

it’s only partly about tech. In some ways, it’s more about. The people side and the agreement and what you’re promising each other.

Shane,: it’s that conversation, it’s me saying, I think this is the agreement, do you agree? And then we kind of negotiate down to a contract that we both can live with. Versus me saying, and I’ll just keep using schema because it’s a really simple one I can understand. Here’s the schema, make sure you follow it.

That’s not a contract, that’s not an agreement, that’s not collaboration. That’s , me saying, this is what I want. And I don’t even care whether you think you can provide it, I’m telling you. So again, the idea of using a contract is we actually are then forced to get two sided agreements between the people providing the data and the people providing the data.

Collecting it. Is that, is that how you see it?

Andrew: Yeah, exactly. And that’s a place where people sometimes go wrong with their contracts. I kind of forget about the agreement bit, the people side, and see it as, I’m giving you this contract, I’m going to enforce it on your data. And if you don’t meet it, Something will happen. And already you’re kind of coming from a place of friction there, a place of confrontation there, which is not what you want if you are trying to move to a more collaborative model where everyone agrees that it’s worth doing this data contract, because by putting a bit more effort in how data is generated, And a bit more structure around that, build discipline around that, but producing better outcomes of business because this data is ultimately used to drive revenue or meet some sort of strategic goal or try some key process, internal process.

So it’s about that communication. And often data teams don’t get to communicate that much with the people doing the data. In a lot of organisations that’d be software engineering teams. there’s reasons for that, like the organisation structures and there’s common ways to run all those kind of things.

The data contract is a way to try and bring them closer together and try and break down those barriers and try to have some communication. And then once you’ve had agreement, you can codify a data contract and use that in your tooling and do the enforcement there. But that’s not where you start.

Andrew: You start with the agreement. You start the conversation first.

Shane,: as technologists and data people, we love to codify technology first, people second, if at all. So I read somewhere that actually one of the things you should attempt to do is have the data publisher. Create the contract and then you negotiate with them to make it as fit for purpose for you as you can versus the data team creating it and then try and have that conversation.

Is that what you see? Is you see by asking them to create the contract of what they think they’re going to provide is a better way of starting that conversation predefined definitions?

Andrew: Yeah, I think so. And I strongly believe that the contract has to be owned by the data generator. Because ultimately, they are the only people who can. Meetback contract. They’re going to be people who have the ability to change the data, to change the schemas. They’re going to be people who have all the context around the data they’re creating.

Like, they’re the best place for people to own their contract. If I go to someone else and say, this is my contract, On putting your data and they say, well, we can’t meet that. We can’t meet the SLOs you want. We just haven’t got the architecture to do that, or we haven’t got the time to do that, or whatever it might be.

Then what’s going to happen next? Like, not a lot. Whereas again, you should be more conversation. We need this data. This is our requirements. And they say, well, we can meet some of those requirements now, maybe some later, but let’s codify what we can meet now. And for me as a consumer of data, at least I know what to expect now, and I can build a lot of confidence.

I know it’s not going to change overnight. I know it’s maybe not as high as I want it to be, but it’s high enough and I can work with that. I can start building with confidence. I know what to expect. The only people who can change the data are the data producers.

They must own the data contract.

Shane,: How does that work, if you look at the example that you’ve got, your company was creating the software themselves, so you had software engineers that were building the core systems that you were getting from. And we know that often people building those software systems create the data schemas and the data fit for purpose for capturing that data, transacting that data, not for making it fit for purpose for answering business questions.

And we end up being data teams that deal with what I call exhaust, they kind of spew out this data or we go and grab it without their permission using CDC. So we get this kind of exhaustive data and Their focus is creating new features, creating value, getting a product, getting product market fit, making their customers happy.

And often their focus is not making sure that the data is fit for purpose for the bit of work we do. So how did that work for you? Was it just the fact that you were in an organization where somebody said, I’m going to bang the heads and make sure the software engineers and the data engineers work together?

How did you of deal with that? Political part of the organization and that problem.

Andrew: Yeah, there’s a few things to that. I think one thing we did is we didn’t have a CTO or Someone important banging heads. Doing that kind of top down approach from the start. We started more bottom up that’s partly because of where I was in the org, you know, I was leading the data platform team.

I didn’t have a kind of authority to start banging heads. But also I think it worked out better that way because you can start proving value and do that quickly. And then once you’ve got the proven value, then you can start thinking about larger rollouts and larger deployments, but you can start small, and really focus on solving a particular business problem.

And once you do that yourself, next one, next one, and then gradually you’re, building some momentum there, and then you can start making a more bigger program of work to maybe move over to the contracts if that’s what you want to do. So I think it’s best to start small, start focused on the value you want to deliver for a particular use case.

So say for example, a team somewhere, building this data application that is going to be Whether you generate in some way, you’re going to set it to clients, using part of some sort of application that makes it maybe new. you’re trying to make sure that’s a success.

And the problem they’re having is that, the data’s maybe not quite quality or it’s not fair. Or it’s not got SLOs or it keeps changing or some, there’s some kind of problem that you want to fix. And the data is being produced by a software engineering team, in this example. So we know the problem we want to solve, and we know it’s available for the company, and we know that we want the software engineering team to do some work there, and we know ultimately they want to own the data contract.

So it starts off by talking, getting everyone in the room, discussing problems, getting everyone on board, and then they start thinking about solutions, If I want the data contract to be owned by the software engineering team, I need to make it as easy as I can for them to do that. So maybe they can define it in code, maybe in the same process as their, the other code that’s actually generating data, maybe alongside how they also generate APIs and define APIs, or define the infrastructure’s code, or whatever it might be, but trying to think about how do you make it work for them, because we want them to use it, and we want that to be easy to use.

As easy as creating an API or as easy as spinning up a database in your cloud using your infrastructure as a code platform. So really focusing on that experience for them because we want them to do that work. And we know that they’re busy and they’ve got other things to do. So we want to make it easy, but also they understand the value of doing it.

So we explain the value as well. That’s where we start really. And we haven’t gotten them involved as well in the solution because software engineers love solving problems as well. If they feel like they’re part of producing the solution, they’re more likely to use it, more likely to talk about it when you’re not in the room, more likely to get other software engineers involved, kind of becoming your champions, for want of a better word.

You start there, like kind of an MVP, you start building the small amount of tooling you need to enable the software engineers to produce the data they need, so that problem you had of producing the data you need for that will be driving the application. And then you prove it there, go on to the next problem, next problem, next problem, and you start

gradually roll out that contract to your organisation while delivering value along the way.

Shane,: One of the things I’ve seen is when you start talking to the software engineers about what you’re using that data for, and you show them, they then understand the value of helping you get that data in the right way, because they actually understand the pain and they’re in their engineers, right?

They know how hard it is. So they’re like, yeah, I get you, I understand what you’re living with now, but they also see the value to the organization that the data is delivering. And so they’re much more open now because they see that whole value stream.

Let’s carry on that thing though of, okay, so you were an organization where the software engineers were building the system that had the data that You had the contract for if we’re working in an organization that’s using out of the box software as a service platforms and they may or may not have an API.

So let’s just say they, do have an API. There’s no real way though, to negotiate data contracts with that vendor because , they’ve got lots of customers . So if you’re walking into an organization that is primarily using out of the box software as a service.

I’m guessing it’s going to be a lot harder to implement a two sided data contract. pattern

Andrew: That is a lot harder. In those cases, you really just have to go as left as you can and put the data contract as close as you can to the source, particularly if you’re dealing with a big vendor. So Salesforce data coming in, you’re probably not going to get Salesforce to change how they’re giving you data.

Maybe using five channels or something like that to bring data with APIs, you can hope for. But what you can do is if you have a Salesforce admin, they can own the data contract and they can be the first person Who is responsible for the data from Salesforce because they have a bit of relationship with Salesforce example, it could be, it could be Salesforce, it could be someone else, but Salesforce example matter that relationship.

But also as admins, they can change things in Salesforce. And Salesforce got customizable. So a lot of the problems come from custom fields and changing phones, which is directly caused by them, not by Salesforce. Change about API Salesforce APIs are quite stable. It’s just it’s a customization people put in Salesforce or the API.

While stable, it’s not really made for extracting data. So there’s kind of problems there, which we can’t really solve. But the best you can do there is have the admin, in this case, Salesforce admin, or whoever owns the vendor relationship, have them own the data contract. And if it’s a smaller vendor, Maybe you have got a bit more leverage there, and you can have a bit more conversation and say, Hey, these are problems we’re having, we love using your product, but the data we’re getting from it is not good enough for these use cases.

How can you provide us better data? Maybe eventually I could be part of your actual legal contract, and it says there’s something about the data quality and SLOs and how you won’t breach it, and what response time will be if you do those kind of things. Can eventually become part of a contract with a smaller vendor, where you have a bit more leverage, a bit of a better relationship.

In both cases, really, as left as you can go, ideally to the person that owns that random relationship.

Shane,: you mentioned that the problem you struck was you were CDCing off the database, and I’ve often heard people describe that as taking data without permission. Because you know, I’ve seen it in especially large enterprises. It’s like, OK, , nobody’s talking to us. We’ll just get the permission to turn the CDC on.

We’ll just grab it all and deal with it. Then you move to API or event based, as a way of getting that data. So, does that mean that actually you can’t use a data contract pattern if you’re using CDC off a database? Do you have to have API and events to make that pattern valuable?

Or could you actually use it if you were using a change data capture or a log mining type of thing?

Andrew: so we have looked quite hardly at how we can apply data contracts to change data capture data, but we never really found a solution that works and really meets the goals we had with data contracts, where we have something that’s more stable. We have something that’s being produced at maybe a bit higher level quality.

So we’re not just extracting. The database schema, we are saying these are events, or these are basically structural data that makes sense for business, not that makes sense for application upstream, which is a different thing, often. We’ve never really found anything that works. Because say, for example, we did try to apply a data contract to a CGC data, and they want to change their schema.

Now suddenly, the software engineering teams can’t change their schema with autonomy. They, they’re kind of stuck waiting for a review by some data team, or, their schemas have to say, And that’s no good for them, they need to change schemas, to add new features, to improve performance, to do whatever they need to do.

That schema needs to change quite regularly, particularly in more tech oriented companies where they’ve got good software engineering practices. They should be changing quite frequently. So we don’t want to prevent those from changing. So you need some kind of abstraction schema. Something else, which is the interface, which is the API.

Again, like I said earlier, like software engineers would never build on top of someone else’s database and they wouldn’t let another software engineering team do that. So why is it okay for data teams to do that? First, the problems are exactly the same and the solution really is exactly the same, is to use some kind of abstraction, some kind of interface, which for data is data contracts and for APIs, it’s data.

Shane,: What that means though is that there’s more design, more planning, more requirements, more understanding up front, and then also the need to be able to adapt constantly. If I give you an example if I use the CDC tool against the database and I bring all the data into Data late, persistent staging, bronze history, , whatever layer you want to call it,

that first raw layer where we get all that change records. I know I’m getting all the data, which means I can delay a whole lot of decisions around data design, around historization of that data, because I know I’ve got it all, and I can now grab it from that layer whenever I need to, without talking to anybody else.

Whereas if we take a data contract approach, I’ve got to actually define the events I need in conjunction with the people that are generating those events. I’ve got to agree what those events look like. And when I go, oh shit, you know, for the next information product, I need a Data from another event.

I don’t want to have to wait three months for the software engineering team to get around to creating that event stream for me and the contract. In that scenario, it sounds like the perfect organizations for this are ones where their software engineering team are used to rapid, Releases small chunks so that I can say, Hey, we’ve got this new information product coming up, , we’ve already got the event for custom orders product.

This one’s all around returns, so now I need an event for custom returns product. And then they can basically implement that. That event, that API and the contract agreement in a day or two, , rather than three months. So that seems to be one of the things that would make the pattern more valuable, easier to implement.

Would that be true?

Andrew: Well, I think there’s a trade off, and what you’re saying is we’re going to do a bit more work up front. And that means we have something more stable we can build on because we’re doing something important. And if you’re doing something important, really, you want that to be, stable.

You want to ship , this data, this process, this application, you want to be able to ship it and for it to be reliable, you don’t have to keep fixing it every week. So you are giving up a bit of flexibility, but in exchange, you’re giving And maybe it was better to prioritise flexibility when all you were doing was reporting.

I don’t know if that was true, really, because you give those revenue reports to your board, and they’re wrong, and they change from last time. That’s not a good situation to be in. And the reasons they’re very wrong is because this work you’re doing from your Bronze layer or whatever into something more structured, you think you have all the flexibility, but really you’re having to do a lot of logic there that’s very brittle.

You’re implementing a lot of logic that’s already in the upstream service, and the upstream service changes, you have to redo your logic. They fall out of sync, you don’t realize it. All the problems that probably everyone listening here understands, right? We’ve had these problems for years, decades maybe, in data. You could argue that giving up that flexibility, but I’m not sure that was a good idea anyway. The other thing I would say as well is the assumption you’re making there is that data changes regularly and that, I’m not going to find anything up front.

In a couple of weeks time I’m going to realize I need something else and I don’t have to wait another three months to get that. But again, in practice, it doesn’t change that often from what I see. So for example, I’ve been working at a payments company for seven years. That’s changed a lot.

It’s sort of gone from a scale up and grown very fast. And we’ve expanded different markets, we’ve got different products, it’s changed a lot. But our core data models around payments are very similar to what I joined. And it’s around what bank accounts they go from to, and how much was it.

It’s got a few extra fields, exchange rate use and things like that. Cause we’re taking money across borders, but it hasn’t changed that much. It changes when new features are added. And as part of that project, add a new feature, you say we also need to update the data model for that, and that’s just part of the requirements.

But otherwise it doesn’t change that much. So I think it’s an assumption that data changes a lot and that we have to react to it and we have to be agile. I’m not sure if Agile is the right word, but we have to react to it as best we can. I think that’s another assumption that we can challenge there. And it’s not that date contracts can’t evolve, they can evolve, but you’re doing that more deliberately.

You’re not finding out the next day that something’s changed, and that field you were using before is now full of nulls because you’re using a new field. Those kind of changes you don’t need to worry about if you’re using some sort of abstraction at a higher level.

Shane,: From a technology point of view, seems that there’s a bit of a race to create the default de facto standard definition of what should be in the contract definition, I think there’s probably, I’ve never bothered to look, but two or three open standards now which look the same, but don’t, is that what you’re seeing that actually there is this move to trying to agree, a minimum viable contract definition for, from a technology point of view,

we’re not, not the technology that’s going to validate it, but the definition of what should be in a contract started for 10 kind of thing. Is that what you’re seeing?

Andrew: Yeah, I’m kind of involved with the open date contract standards. I’m part of the Technical student committee there. And that’s got quite a few people who have also published open standards and trying to work out how we can just have one. And it’s backed by the Linux Foundation, so it should be Open, it’s got some good governance around that.

I can see the attraction for a standard, it’d be great if we could define a date contract in a standard format and then for free, I get some tooling that anonymizes my data based on the categorization I put in there, or some tooling that populates their catalog. Or whatever it might be. It would be great to just get that for free if you use a standard that’s an interchange format between vendors, between suppliers, between tooling.

So I think there’s the benefits if one of them becomes popular is it’s potentially very valuable. this particular standard, there’s quite a few people involved with it. There are people publishing their own standards. In a way, it doesn’t matter too much who wins, as long as we as long as we have one, but I think the up to date contract standards, which I’m involved in, It’s got quite a good traction, there’s a lot of people involved, . People from vendors, people from other standards, people from industry like myself. that’s got, I think, a good chance of succeeding. But it’s still, it’s still quite early days, it still needs a lot more adoption before it can be really useful.

Shane,: I’m great fan of a standard, , SQL is a great standard. It’s been adopted by lots of people. What tends to happen as vendors then what their version of the standard right to, differentiate themselves and, As a startup, if we could have a standard that we just use and we know we get interoperability with anybody else in the ecosystem, then it’s valuable to our customers,

and therefore it’s valuable to us I think the main point from my view is if you think you’re going to go into this space and look at start implementing the data contract pattern. Go and have a look at that open standard that’s under Apache and grab it as your starter of 10.

Don’t reinvent the wheel and then if you need to change it or augment it, do so and then try and push it back about why the bit you wanted is missing.

Andrew: In the meantime, what we end up doing is we do end up writing quite a lot of conversions from our date contract to another structure. And that’s not so bad, it’s not that hard to write. But for example, define that contract in a certain format, we convert it to protobuf, so we can then use that in Vue to Google Cloud, so in PubSub schemas, because you also use Kafka and you can do some schema validation there.

We convert it to JSON for To create some software libraries that we can give to our software engineering teams, and then we can do some validations there that match the date contract validations, other things like email addresses or a regular expression or something like that. we do find ourselves quite often converting the date contract into different formats, and some proprietary formats as well.

So we use date contracts to provision a BigQuery table that’s got its own custom JSON structure that defines a BigQuery table, so we convert it to that. And it’s not hard to write these things, it’s quite simple. imagine if we didn’t have to do that, we could just convert it to one format and get integrate with any of this kind of tooling we want, that’d be a great situation to be in.

Shane,: How often does that happen? How often would you see the data team writing the the libraries or the API tooling and giving it to the software engineering teams to say, hey, use this, it’ll make your life easier, and therefore you’ll create this contract that we can then agree to. Is that a better way of bringing the software engineers on the journey rather than saying to them, I need you to agree to this contract together.

And then I need you to create some tooling where you can validate on your side. And then I’m just going to sit back and wait for this beautiful stream of gorgeous data to come and not have to worry. Do you find that most data teams now to be successful are trying to help the software teams with the tooling around this?

Andrew: I think so. And I think often you have a, or more and more data teams have a data platform team. They may have a bit more of a sort of DevOps or software engineering background as well as knowing about data. That’s kind of my background and I other data platform team.

And what they’re doing is they are providing platform capabilities to maybe other data teams who then do transformations or whatever else they’re doing on the data side. They provide us with orchestration software and orchestration tooling. Then we’re also providing We’re tooling for software engineers to produce data through data contracts.

And we’re providing other tooling around data contracts. For example, the data retention kind of style tooling, backups access controls, all that kind of things. And all of that tooling we’ve been building on data contracts. So we really built a whole data platform around data contracts. I mentioned at the start, , data contracts are a simple idea, you just , describe your data in a format that’s sort of human and machine readable.

It just unlocks so much that you can do with it. And we found for any platform capability we wanted to add, we can add on top of data contracts. So simple things like backups, it’s more complicated things like performing GDPR deletions and data retention and role based access control. All those kinds of things we built on data contracts because the data contract Contains the information we need in a machine readable format.

So it says it’s a personal data or not. Is it about a customer or about an employee or whatever it might be? How long should we be retaining it for? How, what’s the backup policy should be? What SLO should it have? It describes those kinds of things. And then the tooling doesn’t really care what the data itself looks like.

It could be a very wide scheme. It could be very deep. It doesn’t really matter. It could be one field, it could be a million fields. It doesn’t really matter at all. We can easily build tooling that can use the date contract to supply policies to do something with that data. the last six years, any platform capability you’ve wanted to add, you’ve built on data contracts and we haven’t found anything we can’t build on top of data contracts on platform.

Yeah,

Shane,: Okay, but I think you just switched patterns there and used the same term for it. So let me explain why I say that. So in our product, we are what we call config driven. I used to use the word active metadata, but then I saw how everybody else defined it. And we, nah, shit, that’s not what we do. So the way I describe it is Everything we do is stored in config and then the config is used to generate anything else we need.

So if we need code to transform data, the config’s read. The code is generated, it’s executed, and then we dispose of the code effectively. If I want a security policy, the policy is in config. When something interacts with that data, the policy is applied. So this idea of a set of rules, it’s how I think about it.

A set of rules that can then be used in various different ways to do various different actions. And I think that’s what you just described, you took me from a data contract has been human and machine readable definition and agreement between the data providers and the data collector or consumer to make sure it’s fit for purpose.

And you’ve extended that config pattern out now into various other things where it was highly useful. Is that what you’ve done? Is that the jump I think? I think I saw you just do.

Andrew: it’s kind of, it’s kind of a journey we went on with Data Contracts. Our initial thing was, we need something we can use to have people collaborate a lot more and also to move us away from CDC and become a big source of data to something where we create an interface. But once you define that, you think, well, how do we make it easy to create this interface?

So then you start thinking, well, let’s integrate it into a platform and have it platform capability. And then you’re thinking, well, now we’re asking software engineers to own this data, but I don’t want to learn about, All the privacy regulation and I don’t want to know how best to back up BigQuery data.

I just want that to work. It should just happen. Like your governance policies, they should just be automated as part of the platform. So we use that as part of the platform as well. It’s kind of conflict, you know, driven. I’ll say it’s more, it’s got the intention. It’s more attention driven.

For example, this is the data. This is what it looks like. It shouldn’t cross borders or shouldn’t be visible to people outside of Europe, for example. I don’t really care how that happens. They’re not saying, okay, spin up a BigQuery table in this particular geo and put these actual troubles around it and put this back up.

Not deciding all of that, but not defining all of that in a massive infrastructure’s code, maybe telephone config or something like that. They’re telling us the, they’re describing what. They have, and we are applying policies to that based on what they tell us, and based on their internal policies. So there’s a slight difference there I think, because I think it’s, and we see this in platform engineering as well, in sort of software platform engineering, moving away from this idea of , having software engineers define every single thing about the, The platform that they need to make happen and they’re finding all these Kubernetes things are deploying and all the cloud stuff they need.

So saying, I just want a database and I want it Postgres and I want version 15. And then you get all this stuff for free. You get all the backups, you get get React, Ctrl, all that just free. You don’t really care about that. You don’t have to think about that. You’re focusing on, I’m using this, database to to power my application.

I don’t want to care about all the other stuff. Make that happen for me. I’m staying for a date and date contract in your data warehouse that you, you’re owning. I don’t care about all the privacy stuff. I do care maybe, but I don’t have to think about how best to apply those policies to my data.

I want that just to be taken care of. It’s kind of two patterns. , I can’t go hand in hand though. You can’t ask it. People to take more responsibility for data and then expect them to learn about all the stuff it takes to look after data and be compliant with your governance.

You have to give them the tool in the world to do that.

Shane,: Let’s just just break that pattern down. So let’s use an example. So we’re extending away from data contracts, being agreements between two parties of the structure, format, quality, SLA, frequency of that data, and we’re extending the agreement in code, out to a whole lot more use cases.

And so one of the use cases PII data. Tell me if I got this wrong, but yeah, so we have this policy effectively that says let’s take a simple example. This part of the organization can see people’s first names, last names. This part of the organization can’t. Really, really simple rule.

How did we do it in the old days? We’d go and write some views or we’d put some row level filtering on with a weird clause or a holy shit. We’d hold two tables, one with the name and one without the name. Depending on what technology we got and how many times we’ve done it. What do we really want?

Well, what we really want is somebody to be able to apply a tag to that first name column and that last name column, and then the policy to be inherited regardless of who’s feeding it, who’s reading it, , that policy is now enforced as code. But as the provider, what I’m going to do is flag it,

PII, that’s it, ? And then everything’s taken care of me because that’s a simple contract for me to adhere to. I know that that data is PII, so I’m just going to tell you, and then the system should take care of it. Is that what you’re saying, is that we’re then just extending that out to something like backups

, I can see then how you can, you can take that policy based pattern and apply it in many. Different places over and above providing of the data, the application of security. You could then apply it to many different things. Is that where you got to?

Andrew: Yeah, exactly that. And it goes back again to saying how the data generator has to own the data contract because they know that Most about data, not someone centrally, maybe the governance role or some sort of data stewardy kind of role. They don’t have the same context of data to a person generate data.

So they are best placed to categorize data and say, well, this is going to be personal data, or this is going to be a bank account, or this is going to be something sensitive, but they are not best placed to decide exactly how. The company should be handling that data. That’s probably best decided centrally and implemented as rules in the platform.

So that it’s just everyone who is using this platform just get the policies applied for free in the right way that matches the company’s central policies.

Shane,: So why wouldn’t we just get them to define all the transformations we need for data as a contract and then have the system automatically generate it? Why would we just push all the work back to the software engineering teams?

Andrew: Well, that’s a really good question because I’ve been thinking a little bit the same way, not, not completely getting rid of data engineering. That’s not what I want to do necessarily, but certainly when you’ve moved away from building on top of database gamers, upstream database gamers. You’re building on top of this abstracted interface, which can look completely different to the database and can look hopefully more useful to people in business.

It’s a usable data product, for lack of a better term. It’s a usable data product that is well documented. It uses the terms of business users, not for software, not for product. And it’s usable and it’s hopefully, hopefully because it’s a better quality dataset. You can hopefully do fewer transformations on it, which I think is great.

Cause you’re saving a lot of time, a lot of money. It’s quite expensive to do transformations on data warehouse, typically. But also that logic there, I mentioned earlier, that logic is quite brittle. You’re putting back in data warehouse. You don’t really know. How the upstream application might change over time, and it’s going to fall out of sync.

You can’t maintain that logic in two different places. Now it’s the only one place in the upstream application. You should kind of shift that left. So then you start thinking, well, what kind of logic do we need to do in data engineering? And maybe it’s where you’re joining these data products to create some sort of news.

So maybe you’re joining a few data contracts together to create something a bit different. That’s still useful, maybe. I would like to think in a few years time we won’t be building these complex, expensive data pipelines that have lots of business logic in it because through this change we are producing better quality data at source.

I think that’s going to be a better outcome for the business and for data engineering as well.

Shane,: so this thing takes us down to struggle that my data warehouse brethren have. So those of us that have done this for a few decades we know that, , Can be automated. We know that immutable data can be done, it’s the combining of the data together is, is the thing that always takes a lot of time.

And this is one of the areas that when data mesh came out, , it never answered that question. In my view, that was because it was written by somebody from a software background, not a data background who didn’t quite have the war wounds of combining data. Let’s use a scenario, I’ve got an organization that’s merged and now I’ve got Salesforce with customer and I’ve got HubSpot with customer and let’s just say both of those were managed by a software engineering team, but I’ve still got two definitions of customer,

And at some stage I’ve got to put those together and team Salesforce ain’t doing it and team HubSpots ain’t doing it. So that’s where the data team always came in to do that work. Initial definition of a data contract.

Shane,: Provider and Receiver agree that the construct, have an agreement around it. Then, , potentially we can make the the providers give us the same key, the same ID for the same customer, potentially. And our job goes away. Nine times out of 10, that never happens. So apart from that, have you seen any other ways that this pattern of, Contracts or policy can help us in terms of combining that data from disparate systems.

Andrew: I’ve seen one other pattern that’s quite interesting. From another organisation building implement that contracts. It’s a bit different. Cause you know, example, you go to more third party data. So again, you can’t work with problems there. There’s only so much you can do. And probably in that case, the best you can do is have your Salesforce admin make sure that they give you a particular key that you can use and that’s not going to change and your HubSpot admin give you that key, that’s not going to change.

I mean, together you can join them. That’s still probably a data engineering task, probably not a very complex pipeline, hopefully. But then your data engineering team is providing that as a product for the Customers from our CRMs and the rest of us doesn’t care that we’ve got two CRMs or however many CRMs we’ve got.

So that’s, I think that’s still viable, and that’s why I don’t think that engine is going to disappear at all. But I have seen interesting patterns where, I guess more where you’ve got a software engineering team, and it’s like, well, can we provide libraries that ensure that the keys are always At the time the data is generated, it’s at the source.

So what I do is I provide a small software library that if I’m producing customer data, it’s always going to have a customer ID in this format. If not, it will, it will throw an error and the software engineer is going to have to fix that error. You could potentially take it further and have that library interact with some sort of central ID service and get IDs from there if you want to have a centralized ID generation service.

It’s where I think I’ll go in the future, particularly as I grow. It’s quite a small company at the moment, so I haven’t got the problems of loads of systems, but you can imagine one day with mergers and stuff, they might have that and that’s when they might start thinking about central size services.

But you hide that from the data producer. The data producer is saying, I’m providing customer data, it looks like this. And these are the measures and attributes and fields about it. And IDE is in the right format, that we can then use it to draw in data wherever it is. So if we’re not doing that, That’s the sort of data engineering, MFT kind of style process.

You’re, you’re doing that as a source as far as streams you can. That’s quite interesting pattern, I think. It’d be interesting to see how that develops over time.

Shane,: In that scenario of Salesforce and Haswell, we don’t control it. At least we know we have no agreements. If we say, ideally we want a single key that’s unique and consistent across every system and the Salesforce admin goes, I can’t guarantee that.

And the HubSpot admin goes, I can’t guarantee that. At least you know where the shit’s coming from, you know, to deal with it. If you’ve got software engineers that are building the systems that you need the data from, then it becomes a political conversation about why can’t they give you consistent shared keys

because we’re in control of destiny. So why is the organization letting this complexity come in? Why can’t they actually have an agreement with us to do something? And then by showing them what we do with it, then they can go, Oh, actually, we get it, we wouldn’t want to do all that mergey matchy stuff that you’re doing.

Because like you say, software engineers, like data engineers, love to solve problems. Let’s think about data transformations, what do I do? I take something, I change it and I put it somewhere, I might change the structure, I might change the values I might create table as, I’m still putting it somewhere, even though it’s temporary

I might loopy loopy that a couple of times in memory before I write it out, but I’m doing a bunch of what I call nodes and links. And on the other podcast we had a guest that came on this week and was all about system design for organizations. So again, this idea of nodes and links, a circle that does something and then align to the next circle that does something. Think of a DAG, think of an airflow, think of all those nodes and link paradigms. And what they said is when you go into an organization for the first time, ask a person who do they rely on and when. And who relies on them, and when, so this idea of handoffs. If we think about data transformation, same problem,

nodes and links. Why wouldn’t we implement contracts between every node so if I write a piece of code, why wouldn’t I write a contract so that the next piece of code that reads the output has certainty and agreement of what’s in there and will know when it’s not being met or when it’s being changed.

Andrew: that’s a good question. I mean, you use both handles there. And I think that’s kind of how I see where date contracts have most value, is when you’re handing it off to a different owner, maybe a different team or different process, to kind of have boundaries of ownership is where it’s most useful with date contracts.

internal transformation pipelines. You could use date contracts there we kind of do this as well, and it’s kind of interesting because it wasn’t how I expected it to be used, but because we have built so much around date contracts, it’s a bit like Describe, why wouldn’t you use it for internal things so you get all this stuff for free, which makes the next bit of code easier to write.

we’re seeing, we’re doing that. It’s not quite how I imagined it from the start. I always imagined it being at this boundary of ownership and that’s, that’s where it’s most useful. But because we built it as part of our platform capability, it then becomes a nice way to automate even internal transformations.

So I think, yeah, when you have built, These kind of capabilities, and you’ve built them well, and they’re easy to use, and they give a lot of value. Without a little bit of friction, you are defining the contract, you are putting some restrictions on how you might be able to change that, in terms of, maybe you won’t have to make breaking changes to that new version, things like that.

So you are, you’re paying a bit for the contract, in terms of like, the friction. you’re putting a bit more effort in. But if you’re getting this stuff for free, then maybe it does make sense to use it, even for internal transformations, or transformations that are moving data from one place to another, not necessarily to another team, but it’s still moving data around, it’s valuable to have some of the things you get for free with a date contract as part of a transformation.

you could see that in the future. who knows how far this can go, really, in terms of automating transformations to a degree where maybe you don’t have to write loads of SQL and DBT maybe there’s another way of doing that in future, that’s built around data contracts, more describing what you want to happen rather than detailing how you want to do it in the future.

SQL.

Shane,: The reason I brought it up is it’s something that’s annoying the hell out of me right now Even in our product,, where we’ve broken it down into the smallest chunks of nodes and links as possible. You still have to take a series of steps to get to that metric or that value or the data from machine learning model, , whatever the valuable information product that you need to produce is, or data product if all you’re doing is producing data.

And then what happens is you make a change, and even though you can see the lineage, you know where it flows, you can trace it, you can track it, all that good stuff, it’s still not telling me exactly when a contract’s been broken. Now if I have Tests, or what we call trust rules, it tells me when I broke it, but it doesn’t tell me before it breaks,

and so that’s where I keep thinking, actually, this idea of a contract to policy agreement to say, between this bit and this bit, the agreement’s been broken for whatever reason. So Let’s go back to that and let’s go back to that data collection problem the original use case of a data provider and we’re the data receiver.

I see in the data quality world , and especially in the event streaming world, this idea of dead letter queues, and in the old days we used to call them error tables. The problem I see with that pattern is. Okay, I get some data come in that doesn’t meet the contract, or I get some data coming in that doesn’t meet the schema, or I get some data that comes in and doesn’t meet the test, whichever pattern you’re using, and then I stick it somewhere else, and I’m going to deal with it later.

But that gives me a whole lot of problems, well, how do I insert it? where it was meant to be. How do I tell the person at the end who’s consuming the information product that of the million records we got today, 5, 000 of them didn’t turn up because they were wrong? there’s a whole lot of problems we get when we dead letter queue.

And so in my head, if we have a data contract, it’s immutable if the agreement’s been broken, Everything stops. Is that how you see it? Is that how you implemented it?

Andrew: So, we can’t support both. We use the deadlock queue quite heavily. I think it really depends on, on what your users expect, or rather, what you define in their contracts, and what they know to expect. So it’s about expectations again. If, as a user, you’d rather not see any data, if it’s not all there, you care most about completeness, then maybe you want that, oh, the data is not meeting the contract, let’s just pause things, and we’ll fix it, rather than provide partial data rather than provide no data.

Or you might say, actually, it’s more important if some of this data gets through, we can then use it for a process that might, I don’t know, detect some fraud or something like that. It’s better to detect some fraud than no fraud, just because some of the values are wrong. So actually, we might then let it queue it.

We’ll fix, someone will be paged maybe if everyone calls for it and we’ll go fix it. It might be a couple hours but eventually it’ll come through. But in the meantime, all the other data is going through and when you are catching this forwarding example. So it really depends on the requirements of your user, I think.

Shane,: What you’re saying is, I have to understand the information product that’s going to be delivered and how that user or that system is going to use that information product. So the action that’s going to take and the outcome it’s going to achieve, before I can actually define the contract, because that would affect the context or what’s actually in the contract.

Andrew: exactly. You’re providing this data for use case, right? To meet some user’s requirements. Maybe you’ve got many users and you have to sort of compromise and trade off some problematic. You can’t make more maybe, that’s fine. But at least you know what they want and you’re trying to provide them what they, what they need.

then once you’ve agreed on that, maybe making tradeoffs here and there, but there’s some sort of agreement there, you then codify that in the contract and that’s what people can expect from it. because they know what to expect if I They can use it with confidence. It might not be exactly what I want.

In perfect world, they might want it to be 100 percent liable and it’s not on call every weekend in case it goes wrong, but maybe that’s not feasible for whatever reason. But at least they know now and they can build their own processes around that. So yeah, back to that people side again, back to that conversation and that agreement side.

Shane,: In theory, every one of our dashboards should have a codified data contract bound to it,

Andrew: I think so, I think they should. I think if the dashboard It’s important enough. Or if you take contract on is easy enough that you should use it anyway, then why would you not use it? I see this sometimes I was using a dashboard a few weeks back and it was about cost information on our cloud.

I was trying to do is to like work out one of my hats at the moment is around sort of FinOps and trying to reduce our cloud costs, which is a lot of people doing that at the moment in various companies for various reasons. I look at the dashboard, and I’m like, I don’t know if it’s correct, I’m not sure I trust it, I don’t really understand what this field means, and I don’t know who to talk to, I don’t really know who created this, and I’m sure it looked different last week, I couldn’t use that data to make decisions about whether to start investigating this particular service, has that service increasing costs or not, like I’m not really sure what’s happened here.

Maybe it’s not the most important dashboard in the company, it’s not revenue, It’s not some kind of process that’s fraud detecting or something like that. It’s not that important compared to the grand scheme of things, but also it wasn’t very useful to me. It was, I couldn’t use it. I ended up wasting a lot of time trying to work out if I could use this and go to a different source completely to find the data I needed actually to the Google Cloud Console and then just less good data, but I knew I had more confidence in it because it’s better documented and I knew where it came from.

Andrew: Even for most simple dashboards, I think it’s valuable to have a date contract around it if you want it to be used. Otherwise, why would someone want to use it?

Shane,: I haven’t caught up on where the whole reverse ETL went, but this idea of activating data in other systems, where we were actually pushing data records into back to the system of capture or system of control. Because it’s valuable, so rather than give it to humans, going to take the action, give it to the system so it can automate that action.

Having this idea of a codified agreement between those two. It just makes sense it’s what we, what software engineers would do, but as data people started taking over that integration, we seem to have lost that rigor, we always treat it as an ETL task, not a systems integration task.

Andrew: Yeah. I think it’s a, it’s a bit of like a maturity curve we’re going up, I think, and we’re starting to apply more discipline to, to Data Engineering as a whole. And that’s been going on for a few years now, but we’re using, sort of, a lot more than we used to. We’re doing proper pull requests, we’re doing.

We’ve got nice patterns for deploying things, maybe to make sure things don’t break, and can roll back quickly. We’re using instant management processes. A lot of this, and all this comes from self engineering really which itself is also maturing in the same sort of way, and people are moving.

towards more strongly typed languages and things like that. So it’s a little maturity curve we’re going on I think, and I think it’s good. I think we’re doing that because what we’re producing now is of more value than it was before, and that could be various things. It could be AI, it could be whatever, it doesn’t really matter what it is, but what we’re producing now is more valuable than it was before, and therefore we need to apply more discipline, with regard to that production of that value.

Shane,: I think the key takeaway is my perception that data contracts were solely between the data producer and Data receiver and that always indicated the left hand side of the value stream, the data factory was always the data coming into the platform. That one’s not quite true,

it’s a core pattern of here’s an agreement of what it’s going to look like that two sides have agreed to because there is a handoff. It may be a handoff between teams, it may be a handoff between systems, but I can’t see any reason why it’s not a handoff between two blobs of code. As long as there’s an expectation that when it turns up, it’s going to look like something can behave like something, and these other things have already going to be done so I can trust them, don’t have to do it myself.

Then it’s valuable, I definitely think it’s a good fit where this for an organization where the software engineering teams in the organization. I think that given the. Immaturity of the data contract patterns at the moment, and the technologies around it, I think those organizations are going to have a better chance of implementing it well.

If you’ve got 55 out of the box systems and no control over them, then adopt the pattern, but it’s going to be slightly harder, I think, to lead that path. I still think it’s one of the few terms that’s come up in the last couple of years that’s actually got value and going to survive, in my opinion.

Before we close it out and see where people can find you and learn more about this, anything else you want to cover around data contracts? Anything I kind of haven’t asked about or forgot to ask? Or you go, hey, this is the bit you really need.

Andrew: I think what I would say is a lot of times I feel like when I’m talking about data contracts I’m talking about this, this perfect world we can get to and of course it’s a journey. It’s a journey that involves a bit of tech and a lot of communication, a lot of people side. That’s probably the harder part, but it’s the necessary part.

And there’s always going to be trade offs. You’ve got to be pragmatic with this. Don’t expect to go to date contracts overnight and go from C2C to fully date contracts in a quarter . It’s a journey going on. But it’s fine if the journey takes some time, because you’re starting to embed that into your culture.

And that’s what we’re seeing. So I was in a meeting with some software engineers a few weeks back, they were just talking about like, how do you make data available? And they were, they came up with a term that I didn’t even use myself. Described it better than I could. They were like, Oh, so I see what we’re doing here.

We are moving from more. Passive data generation to more active data generation. We’re actively providing data to you because you want, you need to use it downstream for these reasons. I thought, yes, it’s exactly that. And this is someone I’ve never spoke to before, never spoke to with a team, he’s quite a new hire.

But it’s already part of our culture where he kind of knew that. And explained it better than I did in a way, because it’s part of our culture now, because we’ve been doing it for, for six years, and we’ve got the tooling there, and it’s just where we do data now. And we still have CDC in the background, and we’re still trying to move things over.

I mean, not perfect, but it’s a journey. If you put the hard work into that journey, you’ll find it kind of part of your culture. And then you just carry on that journey and you’ll get there eventually.

Shane,: That’s actually a good point. For us, I talk about four ways we get data, Push, pull, streaming and drop. Our customers sometimes push the data to us. Sometimes we go and pull it from the environment or from the SAS platform or whatever it is. Every now and again, we stream. The data will be streamed to us, which is a form of push,

it’s just. More recent or more frequent. And then drop, there’s a file drop or a file upload. And what we know is we know file upload is the worst for us. Highly flexible, really fast, easy to prototype or ideate off, but it’s not repeatable, the contract’s constantly broken. Somebody changes the CSV, they change the Excel spreadsheet every time.

There is no contract and it’s never on it. And I hadn’t thought about this idea of when they push it to us, one of the benefits is we have an agreed contract to a degree, we know what the schema looks like. We know what the data looks like.

We don’t really talk about it yet as a contract and we don’t agree some of those other things other than the schema, which is what we’re starting to do now. When we pull it, we tend to treat it as a CDC problem, even if we’re hitting, Salesforce APIs using, data DDO, it’s still a pull, but we’re still accountable for the contract.

And it’s really interesting idea to say to the organization, actually, you need to be in that conversation. Because like you said, they can go and add a field to Salesforce. That breaks the API contract that we’re expecting without us knowing, and so, even though it’s well documented and lots of tools know how to use it, it still can break.

, I’m going to think about which side of the fence does the accountability sit for something to happen as part of that contract conversation? Because that’s the people part. and process part that intrigues me. And then definitely looking at it internally, is if I’m moving something and there’s a dependency, why didn’t it just become a contract?

Because once I have the contract pattern, I can write lots of them. Quickly, they can be automated, they can be honored, they can be validated. I can just keep using that pattern time and time again, because it has value to me. So if people want to learn more around data contracts, if they want to get ahold of you, apart from coming to the Agile Data Stand at Big Data London, , stand Y760, top right when you walk in the door, apart from that how else can people find you and find out more about this data contract stuff?

Andrew: I have my book on date contracts I mentioned earlier, which will be given away at your stand at Big Date London, so looking forward to that. To find out more about me, if you go to my website, andrewjustjones. com, I also have a white paper on date contracts called Date Contracts 101. It doesn’t go into much detail as we’ve been in today, but more of a high level say visa problems we wanna solve and this how date contracts can help you solve it.

And to get that, you can go to DC 1 0 1, so DC of update contracts 1 0 1 with numbers.io and that will take you to that white paper and it’s free, a free download from website. So that’s a good overview of think of date contracts and problems try and solve. And not as much detail as we described here, but if you like more detail, my website is a good place.

And from there, I’ve got a daily newsletter about date contracts as well, where I go into, detail about how to roll out date contracts at your organizations.

Shane,: Excellent. And you hang out on LinkedIn as well, go find you there and, and sign up to your newsletter because, you post every day on the newsletter. So lots of little contracts which is good, excellent. Thanks for coming , on the show.

That’s Changed my understanding of , the data contract pattern quite a bit. Like I said, I was stuck in the land of, sitting on the left hand side of that data factory between the data producer and, and the data receiver has being the only place it played. So I’m gonna have to go think about this one a bit more, which is all good.

I hope everybody has a simply magical day.

Data Contracts with Andrew Jones

Guests

Resources

Listen on your favourite Podcast Platform

Podcast Transcript

Read along you will

Fractional Data Team

Common Team Problems

About