Analytical team topologies – Ashwin Kamath
Join Shane and guest Ashwin Kamath as they discuss his experience working with analytical teams and analytical team topologies.
Guests
Resources
Podcast Transcript
Read along you will
PODCAST INTRO: Welcome to the “AgileData” podcast where we talk about the merging of Agile and data ways of working in a simply magical way.
Shane Gibson: Welcome to the AgileData podcast. I’m Shane Gibson.
Ashwin Kamath: And I’m Ashwin.
Shane Gibson: Hi, Ashwin, thank you for coming on the show. I’ve been looking forward to this one. And we had a bit of a chat beforehand. And we talked about lots of many varied things. But before we get into that, why don’t you give a bit of a background about yourself for the audience?
Ashwin Kamath: Yes, my name is Ashwin Kamath. I am the CEO and founder of Spectre, which is a new data platform that I’ve been working on for the last year, I’ve been working in the data space for about 10 years now, primarily in the FinTech world. First in our lending, underwriting setting, and more recently, in a systematic and quantitative trading setting. More recently, I started spectre to really focus on improving the production critical grade, data space, where, we start to look at more automated decision making frameworks to optimize and improve different business functions for companies.
Shane Gibson: And before that you’ve worked for some interesting companies, you want to tell us a bit of a background about your journey from 10, 11 years ago, when you first started into this kind of data world and the types of teams you’ve worked with and the types of work they were doing and some of the interesting things that you’ve done over the last 10 years.
Ashwin Kamath: Yes, so I’ve mostly been focused on the data engineering and data infrastructure side, done various projects on the data science side as well to get a bit of understanding. So rewinding back to 2015 time, I joined a company called Affirm, out in San Francisco, to buy now pay later, company started by the legendary Max Levchin from PayPal. And I used to work on the data engineering side for a what we call it the bank engineering team, which optimized and figured out how to take our several billion dollar loan portfolio and distribute that between different capital markets partners have worked with, and this was kind of my first introduction into the big data space, we used a lot of Apache Spark to implement our solutions, it was essentially just a very massive optimization problem to take all these little small micro loans and figure out how do we distribute them between different types of markets partners, like Morgan Stanley, like Jeffrey’s were in a way that didn’t break the covenants that were established in legal documents. And so we had to make sure mathematical constraints were in place, we had to make sure that data quality was exceptionally high, because of course, if you sell the same loan twice to two different firms, you’re going to end up in a situation where you have a regulatory agency down your throat. And so it was a very high stakes environment, very interesting to see the problems that we had to solve in that kind of a setting. And then more recently, I was at two sigma, quantitative hedge fund, they are in New York, where I’m currently based. And I used to work on the alternative data platform where we focused on ingesting data from 1000s of different data vendors, preparing that for the trading modeling teams, and doing all the cleaning the preparation, normalization, standardization, and feature engineering required to really put this data to use in a very automated setting, and the challenges that we faced at the scale of managing 50 to 100,000 data processes. And still wanting to move quickly on the next set of projects is actually staggering. And there was a whole strategy around how to actually develop infrastructural tools that would allow us to move quickly.
Shane Gibson: I started my journey in the data world, the pre big data, I was back in the old generation of probably called data warehousing, or OLAP. And what we saw back then was, we saw a real disconnect between what we called the data miners back then. So data scientists have today to a degree and the ETL developers, the AI teams. So there was always disconnect between BI and ETL. The ETL developers would always create the code and create that data structures and hand it over the fence to the people that visualize it. But i tended to talk about there was a dependency there. And the data mining teams tended to do it on their own, they may treat the data warehouse as a source, but often they wouldn’t. And they’d be in their own bubble, they wouldn’t really talk to anybody else. And what I saw was the big data wave come through as a change of Team topology, this idea of a data scientist turning up in a data engineer. And I saw some organizations to try to find that magic unicorn, that one data scientist that can do data engineering, facilitation and understand the business requirements and the statistical engineering, to find the right answer of the model and then the data ops engineering to be able to deploy the models and build the platform’s and all that kind of stuff and the very few people in the world that could do that. So we ended up with pods. We ended up with a pod. They had one engineer, one scientist who may be on platform focused person and then we started working together, to say that what you saw when you were in the fire there building out these big data platforms and using them within those organizations.
Ashwin Kamath: Almost exclusively, I would say, basically, very similar to the idea of pod, we would have these data science teams that really were the most connected. They’re almost like ops people acted as a strategy folks for the C level staff within the organization. And then we had data engineers, which acted more as almost support for the data science teams and factored in two ways. So the support was both the front end of preparing the data and making sure it’s ready, so that some sort of exploration and ad hoc analysis can be done. But then also, after that ad hoc analysis is done, having someone to be able to productionize data science workflows that are developed within a notebook environment, and quickly put that into some sort of production environments or can run in an ongoing fashion, was actually much larger part of what the data engineering teams that I’ve been a part of, have focused on. And then going even beyond that, in the hedge fund world, we also had this concept of a data support team, where the data engineering team was actually so overloaded with these productionalization requests for bringing in new data for productionizing data science workloads, pre processing tasks, and delivering different types of production model, ML is modeling workflows, that the data support team was actually designed to almost handle the triaging of alerts and data quality issues, that might come up over time, this structure actually gave the data scientists more confidence that they can move on to another project. And six months later, they could look back and say, my previous project is still healthy, it’s work, is still producing good insights, I can comfortably keep working on my next stuff, next set of project without having to worry too much about maintenance, and that sort of thing.
Shane Gibson: Then the team topology of build something and then handed off to another pod squad team, whatever you want to call it, to productionize it and maintain it. That’s an anti pattern compared to DevOps or data ops these days. Because the DevOps, data ops pattern is you build it, you release that you maintain it, it breaks, you fix it, it’s your baby. And you’re then focused on those things, that you’re building that domain, and you’re consistent, whereas that pattern back in those days was build it, throw it, come back and retrain it. But my view was that was because of the scarcity of those data science resources. So couldn’t have multiple teams of data scientists that built things within a certain domain and then maintain them. Is that what you saw?
Ashwin Kamath: Yes, that’s pretty similar to what I’ve seen as well, it’s a good point you bring up, which is that the DevOps side of things have not really taken off within the data world, and you’re starting to see it a little bit more. And actually, some of the newer kind of tooling that’s coming out, usually has some side of DevOps associated with it. So that deployment and the infrastructural side of using those tools is actually getting easier and easier over time. But there’s going to be some similar transformation to equate it similar to what Marcel did for fun and engineering with (inaudible 00:07:57) you literally can push code to a Git, and they have their own CICD system already set up. And that will automatically deploy new versions of your front end application, we have yet to see that really happened within the data world. And that’s probably what I look forward to the most.
Shane Gibson: So these been an interesting team topology changes, we jumped from the Big Data wave to the modern data stack wave, actually we’ve gone away from pods, we’ve gone back to teams of one, analytics engineer, is able to spin up a snowflake, they’re able to spin up a DBT on their own, and then they effectively seem to work on the road, and into end process amongst themselves. So they’re effectively become a unicorns for that collection of data, the transforming of it and then pushing it through. Maybe they don’t do the presentation. So maybe the visualization still done by another team. And there’s a hand off the back, like the old ETLVI days. But we’re starting to see the team of one again. And that’s where the analytics engineering role seems to have gone. And that’s been powered by technology. But it’s interesting that we’ve seen in the past when we have teams of one chaos rates, so people build code, they deployed it has value, there’s nobody to hand it off to maintain it. So they keep maintaining it, then they leave the organization, their knowledge goes or they’re doing such a great job. They just get pumped more and more requirements build more, and they end up with 3000 bundles of code, a one they wrote 18 months ago, and they got no clue what they wrote. They go look at it and go, why did I apply that business rule? Why am I excluding all our customers that have shoes that are blue? I can’t remember why we did that. So did you see that transition in the market right now, from the idea of small pods of people that work together, to teams of individuals or you see something different?
Ashwin Kamath: I think I see it more at the top of the stack, but the data science level and less, so at the data engineering level. So within the data science, usually I do see that most data scientists operate more autonomously and take on individual projects. There’s still collaboration in the data sets that need to be maintained that might feed multiple different data science workflows, but absolutely, you’re seeing that individual data scientists and data engineers are just so much more efficient today than they were even five years ago. And that is, really good for the seen to prove out at this concept. But also that comes with a lot of baggage. When you have a codebase that has 3000 Different airflow dies or DVT models, you don’t even know which ones are being used, which ones are not being used, which ones are required for the highest level SLAs, it’s very hard to keep adding to that stack and that foundation, without taking a step back and saying, Okay, we need to deliver on some human level change management system, make it easier to move quickly and know which alerts we can ignore, which tables are most important, which ones are less important.
Shane Gibson: And we see that a lot. We’ve seen that with data’s mesh buzzword to eventually, that says my waves of technology and team topologies, I started talking about the data warehouse, I like waves, the big data wave, the one data set wave and the data mesh Dave. And what I see is data people, we love to focus on technology, we don’t tend to focus on Team topology, organizational structures, our ways of working our processes, or what we do, because we go straight into a technology solution. And I’m seeing that again, and one of the areas that’s been quite a vocal piece on the Twitter sphere and the LinkedIn sphere at the moment, around the loss of data modelling. And, it’s interesting because it made me go back and say, What does data modelling and well actually, people say nobody models but they do, so use your own piece of code, you’re effectively modeling because you’re structuring a piece of data that has a model physical, or typically a model, but people were talking about this loss of logical modeling, this loss of understanding the business context and doing that design work up front, before we get into code and data warehousing wave, we were strongly modeling. We weren’t good at it. Well, we did good models. But we weren’t fast enough, the way I described it, as there was often a person stuck in a room for six months, scratching the chin or the beard, coming out with this most beautiful model before we could start developing anything. And there wasn’t a great process. But then we went into the big data wave. We seem to have lost modeling altogether. But I’m not sure that’s true, because I’m sure people were modeling. But what did you see, was there any step in the process back then, when people were modeling the business process or modeling the data before they ripped into writing spark, Scala, Python code, to do the cool with the cool feature factories and stuff like that?
Ashwin Kamath: Yes, it’s an interesting dilemma, I would say that’s been going around the space recently, the event goes into the whole ETL versus ELT debate as well, if I take a step back looking way into the past from where I started my engineering career, which was actually not in the data space, but more on the application development, these ideas of resource diagrams, where you model out your database and show the links and foreign keys and whatnot. It’s almost like that has come back in full force into the data space. And I will say like, most application engineers have internalized that. And there’s like very core to how they develop and deliver, usually, the technical design dots, will actually like lay out, okay, here’s our database, tables, schemas and whatnot, here’s what our API’s are going to look like. I see that is becoming what this whole data modeling space turns into, as well within the data space. And the idea being that there’s a different type of an ERP diagrams here, the entity relationship diagram, ERD, there’s a different way of thinking about it at the service level, where you’re trying to provide, you’re using an API to provide a service to an end customer, an end user, and then using it in a more analytic setting, where it’s actually an internal representation of how the company thinks about the world. It prolongs it, the environment, the competitive environment, what have you. And being able to separate those two concepts and have some sort of layer that actually translates from one to the other, it’s going to become very key, whether or not that transformation happens. But the E, the L, the T phase of the data transformation is actually really just an implementation detail. But whenever some production analytical workflow needs to depend on some type of data table, that it’s very important to have that layout properly specified, and schematize, so that it doesn’t change on you as that data or those data outputs get used.
Shane Gibson: Yes, and for me, it’s about the patterns you’re going to use. So you need to think about your architecture very lightly at the beginning of the patterns you’re going to adopt, ideally, to sketch out a diagram, what you see is something really simple. So typically, if you get a technologist to deploy a bunch of services, they’re typically going to draw a piece of paper, boxes for each of the services, but they’re going to go. So you want some kind of data repository service? That can be cloud relational database, like MySQL, or it could be a no SQL database, or it could be graph store. So we’ve got to know what their box is going to be. There’s a box here called store some data. And you want to talk to her again, some kind of API layer, Graph QL. These are technology choices, but these are things you’re going to talk to the data, so there’s a box here. So draw out those boxes, so they can have a conversation about how many of those boxes you need, what they look like, they are doing that work, they are modeling their environment, and everybody does it. But sometimes we do it lightly and badly. And sometimes we do it lightly and we’ll just pick it up on the IOT one. It’s really interesting. I’m becoming micro focused on terminology now, because what I find is, we use words and they mean different things to different people. And I’m looking at, so there’s been an announcement, this will be update in podcasts in a year or twos time. So snowflake and a Salesforce have just announced the partnership, where it looks like Salesforce is going to deploy snowflake instances next to your Salesforce tenancy. So what that means in theory, is lot more money for Salesforce and snowflake, whoever’s got the paywall. It’s almost like data visualization, that means we could actually pass our queries down to that snowflake instance, zero copy clone it anyway, and grab that Salesforce data without having to use an EEL tool without having to extract and load, using five Tran or stitch or eBUY to alternate any of those ones. So we’re effectively getting the ability to query that operational system via code, not extract and loaded in. Which funnily enough, in the data warehousing stage, we had things like reporting replicas, which were near real time instances of the database, that wasn’t sitting next to production, but stopped us taking production out without the queries. Now, what’s going interesting about that is, if that pattern sticks, then that means we actually end up with TEL, because we’re going to transform it, and then we’re going to extract it and load it. Or transform it and copy it. So is there going to be TCT transform, copy transformed. So same pattern, we’re going to transform the data to make it useful, make it fit our business model or how we think and then we’re going to copy it or move it or make it available to somebody else to use. So interesting world around that. I don’t know exactly the postmodern data stack. I don’t know.
Ashwin Kamath: There are people already trying to take that term.
Shane Gibson: I think I’m one of them, because I’m not a fan of the modern data stack. And so one of the really interesting things around that analytic space, lead data science space was this idea of feature engineering. So for me, there are a bunch of patterns embedded in that term. So one pattern is this idea that we create a series of columns that has flag. So we go and say, we want to run a model, some form of statistical machine learning model. And we know that those models like big wide tables, they don’t like doing joins, we know that they indicators of yes or no. So high value customer use number between 10 and 20. And more than $1,000. So a bunch of flag that tell us a feature of a person, which is an interesting idea. And then the models run against that. But within the feature engineering space, there’s more to it than that, isn’t it? Because there seems to be a whole category around those type of capabilities now. And back in the big data wave, I could see the engineers and the data scientists creating those feature sets, creating those tables of flag and refreshing them before they got to the model. What are you seeing in that space? Back when you were doing it that way? There a lot of work creating those features, your tables? And what are you seeing now for it?
Ashwin Kamath: So I think what I find feature engineering to be a positive thing, I would say, in establishing an intermediate step between the more raw versions of the data that might have come from your ETL tool dumped into your data warehouse, after that you normalize it, and then you might create a set of feature, data points, flags what have you, that kind of identify here to our organization, this is what’s valuable to us. And so it factors into the whole data modeling space a little bit more, it reduces the dimensionality significantly, because instead of looking at a more skilled number, you look at like a zero, or one, which almost simplifies what your next set of things are going to do in the machine learning space for whatever kind of machine learning model, you might train on top of those feature files. And it actually allows you to separate the development of these feature sets and feature files, from the actual training and modeling stage. And so what I’ve actually seen in some companies now, is that these are actually two completely separate teams, one that kind of just delivers on these feature sets, says, okay, they’ll literally store that in a data warehouse project that they call their feature store. And then you scale up this Machine Learning team that reads from that store, it can be relatively autonomous, sure, they can make requests to the feature engineering team, but they can also take what’s already there and move forward with that.
Shane Gibson: Yes. And so again, I see if we look at Team topology, we start to see again, as we scale, breaking down to separate teams that do handles to it. And we’ve seen that in the data engineering world. So back in the big data wave, the data engineer did everything. In my view, they did the collection of the data, they did the transformation of the data, and they worked with the scientists to make the data denormalized. And these feature stores, they did all the work into it, very focused on that specific product. So we always have to have boundaries. Otherwise, we’re boiling the ocean. Here’s one from the catalogs. I was like to say so and people can’t handle there. So our domain boundary wasn’t the work we were doing. It was the data we work in, or the people we’re working with, all the type of questions were answered to give us that boundary that made us able to focus. And then what we’ve seen though, was the innolux engineers were seen as hyper specialization. So the data engineer now seems to be relegated, just to the extract and load. What a boring job. We’re going to take data from there and make a turn up over here, where’s the craft in there. And then the analytics engineer seems to get all the fun of transforming, understanding the business logic and that kind of stuff. So we see these team topologies, where people do the end to end and the rule but use their brains and do the whole craft, and then we hyper specialized as a way of scaling. So that’s what you saw back then, right? That there were people that created the feature sets, which were the engineers. And then there were people that use the feature sets, which was a data scientist to do the models on top of that. And then there was a handoff between them. So how do they handoff work? How did they have a conversation of I need six more flags, and two of them are based on age, three of them are based on shoe size. How did that conversation happen? How do they handoff it?
Ashwin Kamath: You’re not going to be happy with the answer and JIRA ticket. That is actually the state of the art I’ve seen. It’s a bar, which is basically communicate through some sort of branding battle at all.
Shane Gibson: And good, thank you for saying that. Because (inaudible 00:21:09) is not an Agile management tool, it is not an enabling Agile ways of working, it is a ticket management tool for a service desk, where we can fire and forget, we can throw those requirements over the fence and make it somebody else’s problem. And nine times out of 10, I have a conversation, listen to podcasts ages ago, and they had the best year of management process I’ve ever heard. And what it was, when you raise a JIRA ticket, the person who got allocated that ticket, it automatically booked a meeting and both of your diaries for half an hour. And if you did not turn up and have a conversation with that person, who picked up the ticket, that ticket got closed, because you weren’t committing to actually helping the person understand your problem, you would just fire and forgiving. Now that was the coolest JIRA management process I’ve ever heard. I love that word.
Ashwin Kamath: It’s an interesting problem, because there’s almost a misaligned incentive in a way where the team that wants the new feature files to be created often doesn’t have the capability to do so, they literally don’t have the permissions to push data into the feature store. And so you get this scenario where there has to be some sort of communication handoff, if you design your organization this way, to be able to say, this is our team on line, can you please prioritize these tasks? But then the whole prioritization becomes like a big thing around, okay, we’ve got this endless queue of tasks that we’re never going to get through, what do we prioritize? And then that starts to factor into other methods of scoring, which tickets go up and down, and that sort of thing, which have not really seen that work well, in practice, it’s definitely an area that I think companies will need to improve on over time.
Shane Gibson: Yes, and again, it depends on the scale and the size, the organization. So I had Sean McGuire on the podcast a while ago, and I’ve known him for many years, we’ve worked together previous in previous lives. And he’s based out of the UK now. And he was presenting at a conference around some analytic stuff. And he attended one of the sessions, and I can’t remember the example use was either Uber or Airbnb, one of those big thing companies presenting, their way of working in their own looks workflow. And Sean talks about how he asked the person who’s doing the modeling up there. How do you deal with the problem of getting access to the data? Because it’s always hard getting access to new data? Is there’s just a beast to solve? And the person has turned around and said to him, it’s easy. I just log a request and 15 minutes, the data turns up. And Sean like, where’s that work? And it’s like, well, there’s a team of 1000 data engineers just sitting there, waiting for the request. They don’t have a capacity, they have a capacity problem, but not what we see in small organizations. So for me, it’s always about the context of your organization. How many people do you have? How much work do you have? What’s the best way of optimizing the workflow for the team? And if you’re big and large? Yes, that hyper specialization, fire and forget might work. I’m still not a fan of that. Did you find that you started to really focus on the way that JIRA ticket was written? The way that the data scientists describe the features they needed, so that the engineer could understand what to build and get it right, or was it kind of loose?
Ashwin Kamath: So we started tightening that over time, and made it to the point where the JIRA ticket actually became more of a form that needed to be filled with, here’s 10 questions that need to be answered about how often does this thing need to be run? What are the downstream dependencies that are going to be leveraging this feature, that gives us a sense of what priority or SLA that should be applied to this, usually, oftentimes, the data scientists will actually provide some sample code to do and that actually was probably the biggest accelerator to moving forward, because at the end of the day, they know what they want. It’s not that they don’t have the capability to create the feature. A lot of times, there’s just doing the initial modeling of the feature, which is like, here’s a SQL query that I need to run. And then there’s all the other stuff around, how do i productionize this, where am I going to run this thing ongoing with? What sort of schedule am I going to run it on? Does this need to run after some upstream dependencies? If I have three chains of dependencies, and then that data like three chains up, it’s coming from a data vendor, I want to make sure that my Transformers running, every step above it was healthy. What does data quality and health even mean in the first place? It really very contextual on what’s happening with that data. And so being able to understand those things actually gives the data engineer a lot more to work with, to be able to kind of deliver quickly on these feature sets and feature flags.
Shane Gibson: One of interesting things about there for me is when I coach teams, they often want to jump into automation on day one, because that’s what we’re trying to do. We’re like (inaudible 00:25:29) get their boring job out of my face. But what I say to them is, you can’t automate something that isn’t predictable, understood, you don’t actually know how it’s going to work, then automating as early as going to cause you a pain in the bum. So go through stages, maybe do it manually, they maybe create a bunch of forms that people fill out. And then once you’ve got that, then automate it, so they can solve them. So that sounds like that was a right process for that you started off, where it’s just a conversation, what do you want, then it’s filled out a form because here’s the things I know, I need to know, how do you want to schedule it. And then potentially, we could automate that scheduling. So you can self serve, you can actually figure out we’re in the slot to edit, and it gives you a safety net, where it goes, Okay, you do realize that it’s going to run before the data turns up, so you won’t want to rethink that. So that’s for me, it is how we should approach this idea of automation, it has to have a level of maturity and sustainability before we can automate, other as just asking for it.
Ashwin Kamath: Automation is dangerous, you lose touch with what’s actually happening under the hood. So you better be sure it’s correct. And I would say this applies even beyond just data and data engineering, even just in terms of what type of internal processes do you want to focus on, especially within the startup world, we only have so much capacity to be able to do things, and being able to get it touch and feel of something and whether it’s working, is almost more valuable feedback and information than having something automated, that’s self service. But you don’t really know if it’s a good thing to have.
Shane Gibson: That’s where we come into this idea of platform as a service or platform as a product. So again, as we’re scaling teams out, we will often split off a pod of people, who work on the platform capability that other people can use. So we’re effectively becoming internal software engineers, again, with building platforms that the engineers or the analytics engineers, or the data scientists can use. And what I often see when they will start to do that, is they don’t adopt that product thinking. And what I mean by that, it’s all of the practices around product management roadmaps, as all of the product thinking that’s really useful for a team. But the key thing they don’t understand is who the customers. And so what the platform team start doing is, building features they want, the features they think might be useful, not features that actually their customers, the software, or the data scientist or the data engineers are crying out for. And that as teams are scaling, is that what you saw, as you saw pods become more specialized, where they should be creating something that serves other pods? Did they actually focus on what those other pods wanted? Did they go down the Cool Aid build what they thought was cool, and everybody would use and never actually monitor whether anybody actually used that? So did we create value? Or do we just spend some time building something we wanted to build?
Ashwin Kamath: Yes, I would say when companies have excess capacity in terms of human resources and headcount, especially within the engineering domain, you start to see more of that, let’s build this because it’s cool. It’s very technologically advanced feel, and like, how cool would it be if this existed without really thinking through the actual implications and the actual value proposition to the end user, at the end customer, the companies I’ve worked at have always been very strained. And so the types of work we always prioritize were almost entirely based on pin points that we ourselves were feeling. And so I think your mileage may vary depending on the company you’re at, and like how much they’ve poured people into growing a certain team without thinking through, okay, do we actually need all of them.
Shane Gibson: And that company context is key. So for me, I see a lot of large corporates trying to go on the database background bandwagon. When I look at them. My question to myself is, are they adopting the idea of D centralizing the data work down to a software engineering team, and I look at the organization that I go, they don’t have software engineering teams, and they’ve implemented Salesforce and SAP and those kinds of things. So they’re not democratizing the data work down to the software engineering team, which is one of the goals, because we know that the software engineering teams treat data as a source, they create the applications that people need, and data is generated as part of that. And then the job is done. They don’t care how the data is used after the fact. If I was him, I wouldn’t at the moment, either. I have a bunch of features or actions, I need to be able to deliver in the product to the customer and the data to support that, it’s not the key.
Ashwin Kamath: I must feel like data mesh is something you graduate to, once you’ve gone through the centralized data story. You start with your Data Silo to go, you centralize a menu like wow, this is slowing me down too much. So then you migrate to a data mesh kind of architecture. And if you try to skip ahead, it doesn’t work from what I’ve seen. You just read more data silos.
Shane Gibson: Yes, I agree if you’re a small company start centralized. And then as you grow, you’re going to have a scaling problem, you’re going to have a centralized team that can’t handle the work that’s coming. So you’re going to add more people. And we know that more than eight or nine people working in a pod becomes inefficient. So you’re going to split the pods up. So now comes a scaling problem of how do you have three teams of eight that are working independently, but still managing the conflict when they are working on the same data or working on the same platform, that whole what collaboration walls trying to keep separate. And so data mesh has some ideas around that, a lot of them come out of Team topology, behavior. And that book, there’s a lot of scaling frameworks from Agile. So there’s the way Spotify describe the way they scale the team, the so called Spotify model, there’s organs process around unfix, which takes some of the team topologies. And some of this Spotify patterns and articulates them in a way that I understand a lot more, it’s just the way he describes them, I find really useful. So really, what we’re saying is, we don’t want a team of 27, we want three teams, or nine. And so therefore, how are we going to make that work, our natural behavior is one of those pods becomes a platform pod. And they serve the other teams. The downside of that now, what we’re saying to those teams is you have to use the platform, somebody else’s building, you have to wait for that centralized platform and that centralized team to give you the features, you can’t go and build them yourself. So we’re going to slow you down. Now if you as long as you’re happy with that trade off. That’s okay. And so that idea of scaling from a data mesh point of view, definitely on board for that. But this idea of passing it to the software engineering team, that’s where nirvana is because he is data people, we don’t need to care anymore. They’re giving us data that’s well formed, high quality observable. But I don’t see many people getting to that level, because the tools and techniques that we have right now, are based around data people and our data skills, not around software engineering people and their engineering skills, two different domains in two different behaviors. Is that what you see?
Ashwin Kamath: Yes, I would say that is how I have seen a lot of organizations mature, the way you’re describing it, you break off this platform team and say, okay, their role, or their goal is to accelerate the other two teams, buy a building on a platform, except the trade off there is that by pushing the agenda to them, if they are not moving quickly enough, and which inevitably happens at scale, you end up with like almost a slowdown. Now, the counterpoint to that is that you have more standardization and consistency, which makes it easier to maintain long term, having five different data pipelines written in five different languages, using five different tools is arguably much more difficult to deal with when the five people wrote those, leave the company, and then you have a new set of five, who you just pass on ownership. And it’s an area that I think about alot, it only really affects bigger companies, I would say, where you have people leave the company. And then now you have to figure out who’s going to own this, how are you going to distribute all the work that they have worked on, which there is so much tribal knowledge and domain knowledge required to understand what happens in the data pipeline, like I can’t even remember what my updated pipeline I wrote six months ago, as opposed to do. Documentation isn’t necessarily going to solve that issue. I see it both ways I see the pros and cons. In general, standardization is a good thing at a certain scale, just to make sure that you actually are able to function and you don’t get just completely burdened by tech debt. But at the same time, you still want to be able to move quickly and still get all the bells and whistles that are coming out every other month. It’s a trade off decision.
Shane Gibson: Like you said, you’re trading standardization because you get some benefits in the future, out of standardizing but the thing you’re trading off is your speed to market, your ability for a team to iterate fast. And so you just got to make that call, what are you going to standardize? And what are you going to leave as a part of the tribe and go further.
Ashwin Kamath: I like to almost think there is the export or the aspect of data science and the production aspect of data science are, two completely separate things. Once you get into the production side of things, where something has signal, you’re trying to apply more automation. I think at that point, that’s where you should start thinking about standardization. These are the things that need to live on longer than any individual data scientist within that organization. And that’s what I see if you lose the ability to quickly iterate at the exploratory stage, it’s actually very difficult to keep innovating, keep bringing value to a company.
Shane Gibson: I’m going to agree and disagree. And it’s around to the patents that we use. So I agree that you need the ability to be able to innovate and explore and do new things quickly. And you need the ability to be able to standardize and production wise, make things run in an automated and trusted way. And the pattern that everybody seems to do, is they break those two things up, they have an exploratory team, go and do the cool stuff and the innovation and then they hand it over the fence to some poor team that production arises. And actually there are people that tend to separate or we say there’s a pot of people that innovate, and the pot of people there production wise and the patent is to be hand over the fence. So I gone and created some cool stuff. And now I’m going to hand it over to somebody productionize it, I’ve seen that fail every time because you’re just creating this friction, either you put too many guardrails in or what’s handed over that the innovation team are effectively writing the productionalized code anyway, or they have written some stuff, that’s cool, handed it over. And then that team that gets it, just gets swamped and trying to rewrite it. And the way I’ve had the most success with teams is that the pod that’s innovating or doing the first cup, or exploring the law, pioneers are going out. And they’re finding the fertile ground where they could build. And they’re doing some testing and putting a tent out there, digging some stuff, they’re planting some seeds, killing some animals, and then they don’t just leave the camp and go off again, and wait for somebody else to come in, like, here’s a camp, cool, no idea what there tents doing, let’s pull it down and build a big building. What happens is the second team actually go and observe what the first team had done. And then they find the products that they could build, that automate some of that work. So they’re looking for things. And if they automated it, somebody else would have value out of that. And then the third part of it is really the test. If another squad pod team picks up that piece of product and reuses it, then you’ve nailed it, somebody comes and lives in the house that you built, they saw some value there, because they’ve gone to the effort of actually moving it. And so for me, that’s the process, right, as people explore, we have to be able to lose our baby, you have to go, well, that team built some stuff, and it had no value. So we’re not going to do anything with it, it’s going to go away, it was a good test, but there was no value. So don’t keep investing. And then that’s for me, that pattern of doing it that way, explore, somebody else looks and sees, I think there’s value there, it’s a marketable product, they invest a little bit more, another squad actually picks it up and uses it. And that cycle is a cycle that’s worked for teams always.
Ashwin Kamath: So what I’m hearing from you is essentially that we should treat our data engineering teams not as handoff for here’s my custom data pipeline, I want you to do the productionalization, but rather, here is my custom data pipeline, I want you to give me tools to productionize this instantly, and here’s how I need it productionize. And that is exactly what I think companies need to start aspiring for, when they break off this team and which you’re right. In practice, I have not seen it done well. But I think part of it is that the space is still immature when it comes to what does production data really look like? Every company is trying to design their own production data stack, in their own completely separate ways, with their own completely separate tools. And most of them aren’t even really hitting the mark for okay, we capture 90% of data quality issues, or we have an SLA of two hours after a data pipeline fails to understand what happened and remedied the issue. Sometimes, several day outages. Another question we have to ask is, whose responsibility is it? When there are issues? Does it go back to the original data scientists who created it? Does it go back to this data engineering team, which is actually effectively where things do go today, which itself actually puts so much burden on the data engineering team that they can never really innovate their way out of those issues? Or is there a separate data support team that’s almost like we’re going to outsource talent to just like triagesis.
Shane Gibson: And what I see a lot is, people focus on the tools. They sometimes focus on the techniques and Breton’s, but they don’t focus on the team topology or the way of working. They don’t focus on their handoff and where they’re at now. And what happens when they’re going to scale. What’s the plan? What do they think they’re going to do when they add another 10 people when they’re successful? And so when I go in and help as a consulting gig, when I go and help a customer, one of the first things around technology choices, or data blueprints, or some kind of strategic thinking around where they could go. First thing I ask them is this benchmark your current team topology. How your teams currently work together? And let’s benchmark what you think you’re going to do? What are you going to decentralize? What are you going to centralize? And once we understand that, then we can find tools, techniques, patents that may work for you. But if you don’t understand that, you’re really just firing into the wind, we have been ad hoc. And so, I agree with you, you have to understand that how we make toast that nodes and graphs of how work is going to happen. And that really is bringing a lot of Lean thinking, if you look at lean from Agile, that’s what it is. Observe the flow, figure out where a bottleneck is, go that bottlenecks causing us problems. What can we change to remove the bottleneck, experiment with it, and then do that? But data teams don’t do that, they don’t look at their own processes and way of working, as a system is just tribal behavior.
Ashwin Kamath: I think it’s going to change out of necessity as well. The hard truth is that people leave companies and today in this environment, people leave companies more than 10, 15 years ago, where people would actually stay with a company for more than five years on average. And so you almost have to design everything about your business and your organization around this idea that people might leave and tribal knowledge needs to transfer hands before they do so.
Shane Gibson: And especially if you look at technology acceleration, so why are we getting new technologies out faster and faster every day? What we’re also seeing as team topology acceleration, and the example I use is, as if the way the funding worked for a lot of companies over the last couple of years, so we saw masses of people being hired into the data space, into a company pretty much overnight. It’s a nightmare. How do you onboard 100 people into your data engineering practice, if it’s not well formed.
Ashwin Kamath: The analytical engineering title is probably like two years old.
Shane Gibson: Then we see the funding dried up. And we see massive cuts. And I feel really sorry for those people who tend 20% cuts of people in an organization overnight. So you think about the tribal knowledge of those people who have been doing work, who understand the work they were doing, they’ve just walked out the door, without pushed out the door. So how do we survive that?
Ashwin Kamath: It is terrifying to think that, every single person who knows about these 500 tables in my data warehouse is gone. And so I don’t even know if I can turn it off. My snowflake just continues to rack up bills. But I don’t even know what’s being used and what’s not. And I don’t even know, which 500 people I could figure that out.
Shane Gibson: And we’ve definitely seen, somebody termed a while ago, and I’ve picked it up because I like the concept, that a lot of features and a product become categories. So you take data lineage, and I like data lineage to a seatbelt on a car, I’m not going to buy a car without a seatbelt. Because it’s just table stakes you have in a car, there has to be a seatbelt there, it keeps me safe, yet, now you get a tool that moves data around and has no lineage. How’s that not table stakes, you got to tell me what you’re doing the data and I’m moving data, you’ve got to give me a picture of what I’ve done. How could you not do that? So I think we’ll see these categories become features again, in consolidation, as we always see the market as we go through different ways.
Ashwin Kamath: When an ETL tool and a reverse ETL tool are completely separate systems, it’s actually very hard to even understand lineage, even if each one is giving their own separate versions of lineage.
Shane Gibson: I’m not a great fan of the term reverse ETL. But then we’ve started to see the term data activation or operational analytics. And it brings me back to that big data wave that you were part of, because one of the changes that I saw, so when we were in the Data Warehouse wave, we were internally focused, a lot of the time it was internal reporting, it was KPIs, balanced scorecards, there was a bit of operational reporting in there. But it wasn’t really using data to affect hardcore process changes in our organization or targeting of our customers. Like I said, the data mining team were typically separate. And in the big data wave, I saw a change of that, I saw this volume of data being used to create models that invoke change either by recommending to a human internally organization, what actions they should take next, or codifying that action, a lot of the time, the sheer trading models, they were saying buy, sell, buy, sell, actually invoking the action of buying and selling. And so that was a change and how we use that data or the action we took. And then we lost it again. And for me there reverse ETL the data activation is trying to go back to that way, where we use the data that we have, to actually execute an action or recommend an action to be taken for the customer or the person or the employee internally. What they should do next? Is that what you see back in the big data wave, it really was around operationalizing the action that should be taken from that data?
Ashwin Kamath: Yes, I think so. The difference with how I see things applied today versus before, is that before it was oftentimes used for single decision making points, I want to understand. For this next ad campaign, I’m going to do some analysis, maybe crunch a bunch of data and then take the CSV output and manually do it, reverse ETL is doing, whereas now you see a lot more automation around how this data is flowing, how it moves in and out of the data warehouse, and the FinTech space, there’s applications in both the front office side of things, where you’re doing underwriting customer who just came to your website within two seconds. And then you’re overwriting. During the times when people aren’t really using the website as much. You’re retraining all the models, you’re pushing them back into your flag and underwriting systems. And then on the very back and invite your reporting data stack that’s basically crunching all this new data that’s shown up during the day creating, Loan Servicing tapes, loan origination tapes, and sending those out to your banking partners, your regulatory agencies and whatnot, and being able to manage that data flow. It’s actually just one very large system around your product and your service that needs to cover such a vast breadth of different types of analysis and data crunching and processing, that it’s arguably more important to make sure the automation is in place. Because humans make more mistakes than even data pipelines do. Make sure there’s enough quality control in intermediate stages of each of these processes, to ensure that if you’re sending data out to a regulatory agency, you don’t want them to come back 15 days later and say, we noticed an issue with your data. Now you need to replay the last 15 days for us, so that we can have the accurate statement. And that actually introduces such a larger burden, I equated almost to recalls in a manufacturing setting, where we’ve already sent out the product. And now you have to bring it all back in and resend our new version of that product. And the burden of doing that is so much higher than just catching the issue in the factory.
Shane Gibson: I like that recall analogy. That’s great. Because what happens in the data world, because there’s a virtual product. We don’t have to recall it. We just change it. And as you’re talking about that, that flip side of things and financial services, does remind me 20, 30 years ago, when I went to get a credit card, it would take days, I had to fill out paper forms and when to human, the credit risk teams spent ages. And now we’re into that immediacy, where I can log onto a website, I can request that it does add value on the background in theory, and credit scores mean says low risky, you go, you have a credit card, so that automation using data has been valuable to organizations and valuable to consumers. And that’s what we should be focused on.
Ashwin Kamath: And that actually, Quicken Loans, for example, and most of these really quick underwriting systems that will give you credit card or loan from the website at all, they actually make their decision under maybe half a second, and they’ll put this little spinner and make you think they’re taking time. Usually they’ll take about 10 to 20 seconds to make you feel like they’re doing work. Because if they just gave you a response to that quickly, it’d be like, well, something’s off here.
Shane Gibson: But it’s pretty cool. I wonder what the model is doing. I wonder if it’s actually doing a lot of scoring or is just going to you’re asking for 5k, we don’t care, you’re asking for 50k limit, we care, we’re actually going to run a model against you. I wonder if there’s a little bit of theater there around, when there is actually something happening or when there’s not.
Ashwin Kamath: The first is usually there’s model scoring happening even before you’ve clicked submit on the form. So the way that you move your mouse, or you copy pasting into the forum, like these things actually factor in to whether or not you’re a fraudster and how credit worthy you are. And then there’s a second aspect of this, which is actually probably the bigger chunk of it, which is a lot of this stuff happens offline. Most of the online processing is a very highly trained model that has been picked out because you match the profile, what that model is trying to score you on. And so how they determine that model is the one to be used, is actually a much larger offline modeling system that’s designed around this concept.
Shane Gibson: And that’s a pattern that people often don’t understand, this idea of real time, because people are like, I want real time. Okay, what part of real time do you want? Do you want real time data feed? Do you want a real time model where it’s actually going to train the model on the fly in real time, or do you just want to use a score that’s been pre calculated in real time to give a response? So that example, I didn’t even think about that example of the way you fill out the form that interaction behavior infers a lot about your risk. But that’s a real time thing.
Ashwin Kamath: As I type these things in, it’s scoring the model of my behavior to say yay, or nay.
Shane Gibson: But my credit score is probably pre calculated, there’s probably a model in the background that’s taken some attributes about me, and pre scored me to say, I look like this type of person, therefore, I’m in the safe bucket, or I’m not, the model itself isn’t real time. But the use of that data is real time.
Ashwin Kamath: Exactly. And there’s like a whole area of engineering that I’ve been finding very interesting and more recent, because a lot of our customers adopt this methodology of, okay, we have offline data processing. And those are the big data workflows there. And then an online serving layer, which is usually just like Elasticsearch or Dynamo DB or something very quick, that you actually run an ETL process from your data warehouse into what are the serving layers, and then putting an API on top of the serving layer. I’m seeing this as a more and more common pattern, to getting and delivering on these kind of real time data on all workflows.
Shane Gibson: And it comes back to that concept of which patent needs to be real time patent and which doesn’t. And what’s the whole supply chain of your data? And where are the choke points? Where are the things you need to improve on? And that understanding is really important. And the other one that from an analytics point of view, or a data science point of view, the other one that I find people get confused about is the difference between core business process and admin process. And I’ll give you an example. So I use a methodology called beam from Lawrence Corp a lot when we’re working with organizations will understand the data requirements. And so they works on cool business processes, customer orders, product, customer pays products, all store ships product, customer returns product, that flow of micro the customer journey, but that’s a core business process. And as we run those workshops, we will always get taken down to administration processes. So if customer orders product, somebody reviews order, somebody approves order, when it’s not automated, and those are admin tasks, and they’re valuable, but we need to differentiate an admin task process versus a core business process. And I see that the same analytical modeling is, that some of the features are actually driven feature flags or driven off to admin processes, and some of them are driven off core business processes. And we have to be really clear which ones we’re talking about which part of their data workflow, which part of that process are we focused on to achieve the task we’ve got? Do you see that? Do you see a confusion often between those two things?
Ashwin Kamath: I wouldn’t say confusion, I think it’s more in terms of an investment by the company. And it’s obviously pretty contextual to the domain that you’re looking at within the financial services sector. You need to make sure your reporting is really good. You’re dealing with a much bigger headache from bad data in a reporting stock, potentially than giving out a few fraudulent loans. Just because that’s how the financial sector works. You’ve got regulatory agencies that are double checking everything you do, and that sort of thing. I think in other applications, it’s less and more being able to come back and say, from an admin perspective, I can understand, here’s everything that can happen. And I need to review this thing, it depends almost what is the ROI associated with a bad decision, if that’s very high, you probably want to invest more time there and make something that is very robust, has significant quality control around it. Whereas I would say, for the most part, you see most attention coming on the consumer side or the actual user facing side. Whereas this is a recommendation system that actually affects whether or not someone’s going to click an item on my ecommerce site and buy, then you’re probably going to putting more effort into that and just back off manipulation around reporting off the recommendation system itself.
Shane Gibson: That’s a really good concept, I have a bunch of patents around this idea of an information product. And it’s really just a way of creating a boundary of requirements. So it’s a way of saying, here’s a small set of requirements, that requires some data, some codes and delivery mechanism, visualization of service, and we’re trying to get down to a size that we can deliver in itteration, three weeks, we’re trying to get a product that is small enough, that we can produce it and give it to our stakeholder for feedback and value, give them some value, get some feedback. That’s a challenge to break things down into those smaller chunks. And we can do them the smaller amount of time. And as part of that, we’ve got the information product canvas, which asked them a bunch of questions about like your JIRA form. That means ideally, we go and say, what’s the outcome you want to achieve with this product? What’s the action you’re going to take? If there’s information turns up? What questions do you want to answer? What core business processes, there’s a bunch of things that give us context around what they’re asking? And what we know is, when we run through those sessions with people where we go, let’s describe this information product, you often use to with start and say, what’s the outcome? And what we got was an admin task, I need a list of customers. So what I’ve learned is, we typically start and say, what core business questions you want to ask. And it’s like, how many customers we go. Once you understand that, what action are you going to take? Or, I’m going to go and do sales campaign, to go and get some more of the numbers below our target. And what’s the outcome of that action? We’re going to have more customers, more revenue. So we went backwards. And then with the data quality side, though, we have a lot of conversations in the market around observability, data quality, data trust all this. And they’re very technical conversations. So it’s the frequency of the data as it claimed. But taking your point, what we don’t have a conversation is, if that data is wrong, what’s the impact to the organization? If it’s just on an internal dashboard, that somebody reviewing the end of the day, the impact is not as large as if that numbers given to her regulatory body, then we’ve lied. And so we don’t ask them, we don’t say, what’s the blast radius of that data being wrong for this information product? And that’s really interesting, we should, it’s definitely a patent we should be picking up.
Ashwin Kamath: It’s almost a side conversation relative to the Agile concept and moving quickly. But it’s one of those things where it’s once you understand that, you can prioritize things much better. If two alerts go off, which 1am I supposed to take care of first? And being able to say, Okay, this is, in some sense, this is noise, this specific alert that I don’t need to worry about right now. Maybe I want to even go ahead and turn that off for the future. Because literally no one cares about it.
Shane Gibson: I think in today’s world, we haven’t really got to that far.
Ashwin Kamath: I seen this, I had numerous companies where it’s, let’s just throw a bunch of data quality, but we’re not going to invest the time to tune it and calibrate it. Who, what, that use case actually is, if I did it for the issue, is detected, that is actually a non trivial engineering burden, or data science burden, to go in and figure out what went wrong. Is there something I have to fix? Do I have to write a cleaning pipeline to remove these bad data points? Or am I going to go manually remove the bad data points, and then depending on how the upstream is sinking, it might literally just keep adding those bad data points back, I think, like understanding the impact in the use cases matter a lot. Even going back to what you were saying, though, is even just understanding what is the specific use of the data that’s being handed over is very important. Someone might say, I want a list of customers, but they don’t tell you, they’re going to be running an email campaign off that list, you might not include the data with the emails, there’s no guarantee that the data you pull as the field they care about.
Shane Gibson: It’s a really interesting point you make we think about flowing the data all the way through to an action and outcome. And then a consequence, we think about it as a flow diagram. We always focus on the left, we focus on the data and the technology, because that’s what we do. We sometimes start to focus on the action or the outcome. Never really thought about focusing on the consequence. And there’s been a whole thing in LinkedIn for a little while now around this idea of data SLOs. I can’t remember who raised it first. But this idea that last mile, that thing that we put in front of our customer, our information consumer, why don’t we set the SLA at that, and then reverse engineer, what everything needs to move to get to that change that data turning up at that time. And so that’s something that we’ve been thinking about, and then the same as with the data quality stuff. And our product, we’ve got data trust rules, and we’ve got notifications. But we haven’t solved the notification problem, because what we know is exactly what you see. There’s just a noise of notifications, the things going wrong, there’s no impact understanding. So we’re just going to quiet them, we got to turn them off. And then we don’t know which one we can put on mute, and which one we can’t. And so that idea of yours of saying, here’s the impact at the end of the supply chain of something going wrong, then that shouldn’t inform us about what we make quiet. You can’t make that one quiet, because the blast radius to the organization is so high. Nobody really cares, isn’t there no real impact, I don’t even know what we get through.
Ashwin Kamath: You have to work better that’s away from this, because otherwise, everyone asked for the highest SLA. I’ve seen this every day. No one wants a SLA that is not the highest priority, because they don’t really care as an individual data scientist in 1000 Person Company. I don’t know what else is going on. But I feel like whatever work I’m doing is the most important. And it’s hard for me to at least balance out between everyone else’s works. And so I haven’t seen this solved in any (inaudible 00:56:08), I have some ideas around how to think about SLAs, usually in the term of if I don’t get this report by 5pm, bad stuff happened. So you set by PMS or target time by 4pm comes around and a bunch of your data pipelines are backed up, up upstream, you probably know you’re going to miss it, you can quickly send that notification. But even then it’s okay. Too much notifications. It’s actually as bad as no notifications.
Shane Gibson: And observability. We often know that buyers are liars. So let’s monitor that they actually are grabbing that information at five o’clock. Because if they don’t, then like you said, Everybody’s information is the most important. Again, observability should be two ways. It’s the severability of what we’ve done to the data and whether it’s working or not. And it’s observability, about whether that data has been used for the action, because if they’re not using it, we’ve just low value, that we’ve wasted our time.
Ashwin Kamath: As I said, the hedge fund world, it’s actually there’s an easy paradigm to go by, which is when the market opens, that data needs to be ready. So we didn’t have too many problems around, understanding are they being used, but then when when models were turned off, because they say weren’t performing as much, it’s unclear what can be turned off upstream, lineage? Well, first of all, is not used in a way to deprecate things. Lineage is actually used more as a debugging tool, I think companies also need to take a better understanding of the full lifecycle of their data into the depth of a data pipeline, turning that off, lead in the code, or at least commenting the code out. So it’s not continuously running where it doesn’t need to be.
Shane Gibson: And you’re right, we never turn data pipelines off, because there’s a low cost of keeping them running in theory. And even when we can see the lineage, we don’t know what impacts the user or the decisions made, what we need as data Chaos Monkey. Netflix use, it has turned servers off to see what happened. So we need data Chaos Monkey that just stops dogs running. And he will see his screams.
Ashwin Kamath: Actually, in a lot of companies, obviously, we do that by default, they just don’t run. Now. What’s interesting is with Chaos Monkey, it’s meant to make sure that your servers are fault tolerant. So a service engineer will actually design their service in a way to outperform Chaos Monkey, and will spin up new containers when their leaners get killed. There is no real equivalent in the data world, the data doesn’t show up that actually is going to have materially bad impact. But the cost of finding out what the bad impact is, might be greater than the knowledge of the bad.
Shane Gibson: Yes. But instead of experiment, better off and see if anybody even notices that. Nobody noticed it’s quite, as I said, hour goes pretty quick. So just before we close out, is there anything else that you wanted to cover that from Agile and data and that way of working, is something you’ve been passionate about or experienced?
Ashwin Kamath: Yes, I think I’m very passionate about the intersection of data orchestration with data quality, and being able to quickly iterate on, I have a data pipeline, I know its producing value for the company, and I want to push this into my production system. And then I want to wrap it with some quality control. And that set of steps is, in my opinion, completely ignored in the industry right now, that I think it’s going to get more and more attention over time. And it’s very important for this concept of being able to move in an Agile fashion and kind of move on to your next set of projects, without having this level of anxiety of are my old data pipelines working, or my old ML model still performing that sort of thing. And I would urge everyone to pay more attention to what their production automation handoff looks like, don’t put the burden on data engineers to just manually do it. But find ways to automate it and keep improving on that stack.
Shane Gibson: All right, great. Bring that Lean thinking where you actually understand the system. You identify the bottlenecks or the risk areas, and then you do something about iterating the way you work or your system to see if you can automate it, reduce, fix that problem and just rinse and repeat. Constantly updating the system, so that it’s better. That’s what we should be doing. I look at this being great, thank you. And we went all over the place as we often do, but I think for me, key takeaway really is that blast radius, understanding the impact of something happening or not happening, how it’s used, and from a data quality point of view, if it’s bad data, really? Do we care what bad things happen? Sometimes we do and sometimes we don’t. And we should prioritize based on that. If people wanted to get ahold of you, what’s the best way of people connecting with you in the modern world?
Ashwin Kamath: I’m available on LinkedIn. That’s probably the best way, our website is spectredata.com, which you can always reach out to us through the Contact Us form there. LinkedIn is probably the best way. It’s an absolute pleasure to be here. It was a great conversation. Thank you so much for having me.