A DataOps and Data Science Journey

Join Shane as he chats to Hamish and Liam in a pub in the middle of the North Island about their journey to adopting a way of working that combines DataOps with Data Science.

Guests

Hamish Gray
Liam Cole
Shane Gibson

Resources

Recommended Books

Podcast Transcript

Read along you will

PODCAST INTRO: Welcome to the “AgileBI” podcast, where we chat with guests or sometimes just to ourselves about being Agile with teams, who are delivering data, analytics and visualization.

Shane Gibson: Welcome to the AgileBI podcast. I'm Shane Gibson.

Hamish: I'm Hamish.

Liam: And I'm Liam.

Shane Gibson: Welcome, chaps. How's it going? Well, we're recording not from Wellington today, and we're recording in a pub. So that's first for me for both of those things. So I think we'll probably start off with Hamish, here goes a bit of background about your journey into this world of Agile.

Hamish: So my story in data analytics starts back in probably 2000 in the UK, where I fell into data warehousing through access databases, and then into a few BI reporting tools. I lived in Australia for seven years. And then back to New Zealand five, six years ago. And the Agile journey started probably in January, when we met Shane, who came in coaching our team, where we work.

Liam: I've only been in the workplace for about nearly two years now. Obviously, straight out of university, my journey in data analytics, I'm obviously a data scientist. My progression in the workplace has started around a team, which traditionally did reporting to a team, which now does predicting. And then also, there's some of the buzzwords around like AI and machine learning and predicting and models and all sorts. So I came in to help try and connect the dots on some of those things. And try to be a team which is more trying to do more things.

Hamish: It is more interesting things than just reporting. I think before we met Shane, we were stuck in a BAU support model. And we got the opportunity to create a new platform, cloud based platform and with that came an opportunity to do things differently and work better, as well. And that's been probably in Liam's job has been the, on more of the spiky end of that with all the maths and things like that, rather than the crunching of rocks, which is probably more my data engineering department.

Liam: And I guess we've now gone down this Agile route for about nearly a year now, since our first training to then running a couple of Sprint's. So it's been actually quite an interesting journey and learnt quite a lot along the way.

Shane Gibson: So today we're going to talk about that journey. And like you said, it's been helped in almost a year. Although we can't think, we went in there lightly to begin with, before we double down and we go combat. And that's interesting, I suppose. It was just telling the people on the course today, often when I work with a new team, they want to go Kanban, because it feels comfortable. It's like the chaos that we normally run. And in the past, I used to we say, No, you have to go Scrum, because I'm more comfortable with that, it's more of we put unnatural constraints that helps you to learn some of the Agile behaviors faster. But as with your team, it was give it a go. It's give it a go and see what happens. And then if it doesn't seem to work for us, we can bring some more of the scrum behaviors at beginning, until we learn those and move to the compound flow based stuff. So how'd you guys find that? How’d you find trying to use flow base model on day one before we flipped and became more constraints based?

Hamish: I think you're right, it was fairly comfortable. Everyone's worked in that way before and even other teams, we've had tickets, maybe not deliberately Kanban but on a wall that you'd move between those statuses. So that worked fairly well to start, the thing that helped the team most was the Renew focus we got out of it. And the things around being able to ignore other people a bit more, you gave us permission to focus on ourselves, which we did fairly well to start and that had interesting side effects as well, which wasn't always didn't realize that it would cause, other people would look inwards at us and wonder what we were doing and we were trying to get on with what we're doing. That was our assumption, at least for so come and at the start definitely helped, but it wasn't. We knew it wasn't going to last for everything.

Liam: It also took us a couple of attempts, we thought It might implement this relatively quickly within, you may said it might take a while and then we thought, we should be able to have a product owner that doesn't need to give us 50% of the time. And then we went along that journey and figured out pretty quickly.

Hamish: We're still having trouble. We've had good, we've had, but we understand and they've got probably the hardest jobs. And that's what I tell them.

Shane Gibson: I agree, it's one of the hardest things, find to get good, dedicated, committed product owners that are right for the team. It comes back to that waterfall behavior that, you can throw a set of requirements over a fence and, wait, and then go back right at, you at time to tell the team that it's not what you wanted, versus spending the time to engage, you went through quite an interesting journey where you started off with quite a strong product owner. And then a couple of the sprints, you got one that was a little bit less available. And then back to quite a strong product owner again, in terms of the next person. So how did you find that in terms of that, see soaring between somebody who was committed and dedicated, to then involved and then back to another one that was committed and dedicated again?

Hamish: I think it was knowing what decisions to ask them to make, we got into the habit of we'd start making decisions for ourselves. And then it became a motto, let's ask the product owner, and we had a product owner who was there as an waver to go, want to go, then we actually would start to engage with them constantly, and it did work. Whereas if they're absent, one of us would just take the lead and just decide, so which is been our probably worst, if I think and it is down to the product owner giving us direction, but it's depends how the information products come to us, are forced from up on high and some are generated from an actual need and having the person understanding, what it is that needs to happen versus just thinking this is something that someone's told them to do that doesn't always work. But definitely, with the engaged product owner, was the best we've had. But unfortunately, that priorities changed. We're not on that at the minute and then we're on something else. So that's another story.

Liam: But product owners thing is something that we've tried, because I guess if they don't really know the role of a product owner is as well, we've had to coach them a little bit on that as well as then engage them in a sprint scenario as well. And I guess also, the different thing is that we don't have the luxury of being able to, have that as a prerequisite, as an okay, this product owner does roughly know what the product owner does before we actually commence a sprint, usually will then commit to the sprint and then kind of teach them Okay, can you please do this and make sure you keep an eye on the tickets and then on the things like that. So I guess we're slowly refining the product owner onboarding, as we get further down. And I guess it was only one way to learn that. And that was through doingit, which is what, what what we've heard multiple times.

Shane Gibson: One of the things I find interesting is that, if we're looking at what we call the traditional Agile approach, where we're doing software development, we will typically have a product owner for the lifecycle of that product. But when we do data and analytics, because we deliver something to assume part of the business or for a certain business process or a certain outcome, we seem to switch product owners in and out more often. So we don't have a dedicated product owner that takes us through their journey. We have a product owner come in for that outcome, and they exit and then come back in later. So that makes it difficult because we are recycling them. And you'll find them. I've started to realize this as this role called a product owner coach, which is, helping those product owners on board early. So, from a definition of ready, we try and groom the tasks before they hit this the next itteration, we should probably grooming product owner the right, (inaudible 00:09:21 - 00:09:29) but then if we're struggling to get their investment, for the length of the integration, where we're doing work for them, getting their effort early, when that is learning how to do that role, is going to be challenging as well.

Hamish: And the successful product owner we did have, we spoke to the manager and made sure that they were aware that coming down to sit with us 50% of the time and we actually did that but well, whereas on other ones, not so much. So that was probably the difference we made. And then we were very clear. By the way your role is to make sure you talk to the stakeholders. I'm not going to talk to them and make sure that they're aware of what we're doing. You need to keep them all on board and they knew, that's what they're doing. And actually, problem we had a while back was, we had stakeholders come to the demo day. And then they're like, Well, what's uses this? And we assume that the product owner had been talking to the one other stakeholders this, it's no good if you do it every day, it needs to be done more often than that. And we're like, okay, we could do that but the product owner never knew that their role was to talk to stakeholders and make sure that their requirements were met too but which comes back to product owner training, I think so.

Shane Gibson: And that's critical. Especially as we recycle them in and out, not recycle them as they put them in the crusher and bring them back.

Hamish: Well, we do have a little bit of documentation around, because everyone uses the same sort of terms, like done, in progress, and so forth. And we had to then make our own little documentation or our own definitions, our little flavor of it, because we learned quite quickly that the Agile definitions in some different parts were quite different. So we had to make sure that we did write that down a bit of documentation, and then show that to them. So that's part of our onboarding is having that.

Shane Gibson: There’s funny one was, there’s another team, and they have a stakeholder committee, a governance group, which is something like can't not do at the moment. So Agile delivery, the team, but not so much business agility. And so they were having a steering committee, and the steering committee actually had some Agile training. So the team were presenting the information products, and what was next in terms of the priority? And the stakeholders said, Well, what's that? Is that an epic a feature a user story. And we hadn't quite hard terminology up. So, you can have a bunch of epics of an information product, if you want to, you can have a bunch of features within. And so as part of that, have hardened it up to say, well, and epic as an information product. You may not want to have that one to one, but actually, by having a one to one, it's really clear. So having that terminology is a really important also.

Liam: We quickly found that we ended up having epics and tasks, and then everything else in between was, we didn't think about, we didn't think about features, it was more acceptance criteria. And we kept it down to that. So we've only really got two things, an epic is an information product. And then there's just the tickets we need to do to get the thing done. And that was what we did having the rest of it. We found too much effort for probably the size of team, we got small team, we didn't need to have all these different things. But success criteria having mean and not lots of user stories, but enough to make it usable, but probably what does success look like? That work well. Isn't it having those things and then we can work to them. Probably don't refer to them as much as what we should. But having them set up front, allows the product owner to see what we're going to get.

Shane Gibson: That idea of acceptance criteria. How are you finding setting those because I tend to struggle getting good acceptance criteria, especially with data and analytics.

Hamish: There was the one we did with the good product owner. Liam and I pretty much had to run the planning session by the end of the day, was knackered because we had to draw out all things we needed to know and it was just ended up getting a little bit technical, but we've got things and we can point to them and say that's done or not done and it's hard work and it's takes longer than what you think, the planning probably the big thing we learned is the planning and the retros and all those type of sit around, just take ages and at the start so much time spent doing it but it's useful. It's been a day and a half planning, then three days executing eventually, it's worthwhile rather than just the opposite of that which is probably how it used to be.

Shane Gibson: Interesting question I ever got asked, by somebody on the other day is, of those five ceremonies that we do for scrum, which one would you lose? Get lose one which I'm wouldn't you do?

Hamish: We had to do backup financially, to be honest with you, backlog grooming, we really do that just for the better because we've got someone remote or the retros, give us a good time to download.

Shane Gibson: She thinks that not doing the refinement as often as we normally would. Is a symptom of not having a really good IP backlog? That is not really a bug Trouble IPs that have been prioritized and that, which one might next as that seems to be slightly more random in terms of which priorities, the priority next time which, which is okay. From an Agile point of view being told just in time, what you're working on, is okay, it's not great. You want to have a bit more of a roadmap, ability to groom and think about refine the stories before you hit planning.

Hamish: I think with us, there was a wider company, Agile thing kicking off at the same time, and I'd go to bigger planning sessions and they say, Well, we can work on that in two weeks, like, how can you work on that in two weeks? Because our current sprint ends in two weeks, and everyone was just, you can't say that? It's like, well, we can, we'll work on whatever's big next, but you're right, beyond whatever the current sprint is, we never have a view of what's next. We haven't, because that's what changed three times last week. So it's not ideal. But that's what's been happening to us lately.

Shane Gibson: And we did a bit of portfolio planning. I remember, we tried, we had two or three guys, where they tried to give us like a roadmap of what next would look like. And then as soon as we walked into the next iteration, priorities have changed. And so, we decided that it wasn't worth the effort. We're putting all our effort in, for somebody that was giving us minimal value at the time, for a whole raft of reasons.

Liam: And I think our transition from a reporting team to a pure analytics team has been harder and longer than what we would have hoped due to other factors that we can't control. So we're stuck with a lot of BIU things that are not pure analytics, but we get stuck with them. And that just seems to be stuck with the people who have been there the longest. And these types of problems we get to see, we always talk about BIU bleed, and we've never really got a good solution for it. So we're just learning to accept it now. And just do it. Time box, it doesn't always work. But that's some of the approaches we've been taking as a thing.

Hamish: Around acceptance criteria, because when I did a little bit of Scrum mastering for a small dev team or two, three people, and the acceptance criteria was liberating, that was like the lifeblood and that we did usually, one of the Sprint's we didn't quite prioritize, we said, that acceptance criteria and bullet points, we had things as high priority, and would give three or four of these tasks as high priority, but then we actually looked at, Okay, which one of these higher priority has a higher priority? So we've over time managed to also, prioritize that acceptance criteria. And I guess the next step also is, if it's something we look into is whether or not we link some of the tickets to the acceptance criteria or something because, we don't actually have a direct link, we sort of do. But there isn't necessarily anything showing.

Shane Gibson: So your squad, your economy, your are on sprint goals, right?

Hamish: Yes.

Shane Gibson: What successful sprint looks like? And then as a team, you're mature enough to decompose those goals down to a series of tasks. But I'll come back to your point about we're not really doing user stories or features. And that's probably because, we're walking into the iteration for Sprint Planning. With a product owner and a goal we haven't refined, we're going straight to what she seems criteria for the sprint. And then how do we apply tasks to get there, and the teams mature enough there that works?

Hamish: Well, in some cases, though, we end up with a lot of off ticket tasks, which is always working on that ticket task, which is the combination of the five other tickets. So what we often find is the task creation comes at the end the day, Everyone's tired, so we just smashed them out. And then we've stopped doing that. And actually, the next day, we'll do them all again in the morning when we're all fresh. So it ends up taking a day and a half to do the planning. And then the tickets are getting more task related. But because we're working in new technologies, and we're learning constantly, it ends up being, we've got more than one way to do things. Sometimes, today even we change our mind on how are we going to do it and then so what ticket are you working on, is actually all of those five, all at the same time? How can that be and then they all move. So that's where backlog grooming would help us where we actually add tickets during the sprint that ends up being a ticket, because it is a thing. So it means our estimation and velocity is always out the door.

Liam: Really, it's the story points to take it and the low band on shot that we've been looking at as well. It's the story point thing. We're still really trying working hard to try and get correctly sized.

Hamish: But what would take us 20 points in January, now is seriously three to five points. So we've evolved our platform to the point where we can do stuff quickly, but then some little cloud based stuff. There's always cool little updates they do. So we've got little things like that they catch us out all the time. And I like shiny things. So try to avoid doing upgrades but sprint doesn't always work.

Shane Gibson: But again, you've been building the airplane, why flying around. Have you found that emerging architecture, emerging design, they're actually building the platform as you deliver value to the business users versus what we would call iteration zero, where you're given three to six months to build something that meet everybody's needs. If you had to go back, which way would you go?

Liam: The way we did it, to be honest with you, we don't know, where we talked today, someone asks, well, why'd you do it that way, we're not going to build anything we don't need to use in advance, we shouldn't assume we're going to do it that way. So we've built some things that we have changed. But we've built all the things that we thought we might have to change, it would never change ended up being the right decision. So it's better to make a decision, then try and figure out which method was better. We've got three ways to do one thing, but we'll slowly migrate to one or we won't or as and when we need it. So a good example of that is our onboarding documentation, which is non existent. So new people find it hard, but we've never had to onboard anyone until last week. So now we've got people, now we'll build it, will do just in time. And that looks to me, it makes it fun. It makes it interesting. So you've got to think on your feet.

Shane Gibson: But how much refactoring do you think you had to do over last few years?

Liam: We haven’t refactored. We did take actually about a week and a half of refactoring. At one point, which was a good decision. But other than that, there's not been, there's now two methods for doing the same thing, which we need to merge into one. But the idea of being able to build everything, burn it down and build it all up from the beginning. It's still the idea. And we can't do it tomorrow. But that's where we want to get to, where if we had to refactor it all and change all the metadata columns for someone had a different, someone who used CamelCase, or we wanted to change our naming conventions, we could do it at all should just work. But it's not there yet.

Shane Gibson: But I guess using your Aeroplan example, it's kind of boarding. Well, it's an air but I guess it's a user of it. I see it as sometimes, when you go to land, you quickly refueling tachnique, being up in the air again, and onto another news spread. So it's whether or not they're the trade off between up in the air or whether or not you stick down because we've been doing sprints back to back. So I've been lucky enough or smart enough that it hasn't crashed yet, either. So that analogy is not so much that as you're building the airplane, that's not good enough. That's, the analogy we're using is that we call it a sprint, but actually it a marathon. Where we sprint and then sprint, and then sprint, and we never actually stop. So there is no respite. From delivery of value to the stakeholders, there's no gap. And I was reading something the other day, or podcasts or something where they were actually saying that constant work, even when we get good at making it 40 hours a week or whatever, standard weekers still burns us out, they've actually we need to inject downtime. So they were talking about leaving every couple of Sprint's, taking a week out, where you don't actually do anything. You always stuff to do, but you're not actually in the sprint cycle. So that's, interesting thing. I'm the one like using hackathons or innovation Sprint's where, you're still doing something, but it's, more fun. So one of the teams I work with, they do the Dragon's Den, so they were a lot bigger. So they actually get hired was on here talking about that a while ago. So they've got the whole team, and they had 20 or 30 people, and they split them up into teams of five. And then each team got to pick something that they wanted to build. And then they all just spent a week building it. And then they presented to the senior executive as a Dragon's Den to get funding, to be able to then put it into an iteration and deliver it into production. So they were still doing work Ish, but they gave them something different. A little bit of variability or, well, that seemed to make a difference for them. And I think that's one of the key things that you need to recognize for you guys. And so your team is tiny. So, how many people have you got on your squad for this on average?

Hamish: four.

Shane Gibson: So, that’s the enviable team size. Very small team for building an entire platform and delivering value.

Hamish: In some ways we enjoy that because everyone's always got something to do. There's never, a dull moment and it's makes it interesting. Like some of our deployments have been pretty good fun, even though we're doing them on Thursday, when Demo Day was on Friday. But obviously, lots of high fives. Why should we be surprised that works? One of the things we put it in the start is having tests for things. So we've got pretty good at getting tests in and then we do slack every now and then we've had some stuff lately, which is a classic ETL process where we haven't got a test for the actual result. But we need to go and put in tests, to make sure that when we've got shared components, and we change them now that ETL we built called ETL two months ago, still does, what it says what it should do, and we want to change these things. So we're slowly building up a little bit of not dead, but features we need to add to make sure that everything keeps running smoothly. We've got two three things in production now. And one of them's publicly facing, and one of them's getting consumed daily by a team of 10, 15 people. So we're building that critical mass, now where we just can't destroy. First of all, we couldn’t afford to get away with it.

Shane Gibson: Now I think you have a very strong testing mentality. And you've also automated the testing a lot more than I normally see. Why do you think that is? Most data analyst’s teams I work with, they struggle with automating testing,

Hamish: I think we set out to do it. And we've been lucky enough to work with a few people that have got a good background and different testing technologies. And I think I've worked in warehousing long enough to know that not having tests kills you. And the good projects I have done is when we did have ways to burn it all down and stamp it all up and build tests. And now they're all code based. And we're using different frameworks. And that's experience, you don't want to be having to get up at six in the morning to check that something's completed, you want it to just to work. And so that's having done that lots of times, that won't do that anymore. And that's probably where it comes from.

Shane Gibson: And also being that small team, you can count on it for a while, the person to be the 2am wake up and fix it.

Hamish: And I think we've built redundancy into things too. Whereas some of the things we've done can go stale for three, four days before to actually make a difference. And we've thought about in advance, we've got that give ourselves time to fix it. It's daily loading, it's whatever, it's got time. It's not. It's not streaming data, which we're lucky about. But if we did, we would up the level of testing to make sure we can handle that stuff.

Shane Gibson: So it's typically unusual still to have an engineering team that's building a platform with the science team. Normally, it's the engineering to build something and the scientists either build something on the side or they wait and do it. How do you find that being part of the same team? You're trying to build data and a platform at the same time, where you're trying to use some analytical capability against it to see what the data can do for you?

Liam: Yes, that's ongoing. That's been hard. Before, we start a sprint, we give what sort of times we have available. So I guess when I do some sprint work, it may not necessarily be data science related, it could just be visualization assistance. Also anything in data science is hard too production wise as well. That's usually a lot of one off ad hoc stuff. So being able to try and do any of that and a sprint fashion and we're still working through, I don't know if data science around Agile, then work that good. I don't know, I haven't been around long enough. But the data science and it's definitely been a bit of a road to go along. And there is around the machine learning stuff that we're starting to dabble with as well like the use cases for them. At the highest priority, at the full team because I've only got one team working quite at where it is down the line, it is working on. But we work on that stuff outside of the traditional team sprint that we do, that out side of the sprint.

Shane Gibson: So, you're involved in the sprint. And sometimes when you have a case, basically you're helping out. But by being involved in the sprint, you're understanding the data of getting landed, what it's been used for, how much you can trust it. And then as you find analytical problems that you may want to work on, at least you have an understanding of where the data is, and what's available. Versus being an intricate part of the team delivering a model at three weeks, at the same time the platform is being built, the data has been landed, quarterly issues being solved, and the business has been delivered. So which would have been a bit of a stretch. I would have thought.

Liam: And learning how to get the data on the transformations that go along that value has been huge. So learning some of the techniques or some of the other data engineers, and that has been out.

Shane Gibson: Can you think by being involved. It stopped you going off and building your own analytical platform while you waited?

Liam: Yes.

Shane Gibson: I see a lot of writers that starts off as an analytics project, because AI is the new call. And then everybody goes, well, that's great. But actually, we don't have any scalable infrastructure. And we don't have any data. So I have seen customers where they hire data scientists, a data scientist turns up, and they're like, well, seriously, you've hired me, and there's nothing for me to run code on. And there's no data for me to use. So you hired me, because? So that's one scenario I've seen, that hasn't worked so well. But the other one is, we just do data and traditional reporting, and we never explored the analytical side.

Liam: So I guess also learning the team culture and team behavior being out, because we'll go out. And we'll go and test some things that are probably in beta and some of these cloud analytics platform. So the way we learn how to approach on the team members in the way that suits them, so we're not exactly hassling them so much, but like, look, we're keen to try to this sorts of thing. And I'll help, logging in a way which is easy for them and easy for us. So that has been a great benefit as well, that makes sense here.

Shane Gibson: One of the other things I saw that I found quite interesting was, typically, from a data point of view, we're taught to automate everything, automate our collection, automate our landing, yarn ingestion should all be perfectly automated and bulletproof. But you caught it took a file drop, I call it, you may have other unique names for it. But a way of grabbing some data and landing it safely on the platform. So you can then use it early. From an analytics point of view that seem to allow you to do some other things early. Because when you needed a piece of data, you could basically go collect it yourself, drop it onto the platform in a more repeatable way, and then explore it versus having to wait for somebody else to do it for you, or ask permission, or, log a ticket and wait for the next sprint, is that approach of having a robust way of acquiring more semi manual, semi structured data being valuable?

Liam: Yes, to be to be honest, the thing with a Cloud Analytics platform, it's came in was the third party apps that come with it and the ability to process large amounts of data. To be perfectly honest, some of the stuff that we've been doing well, typically on a data science stuff hasn't been particularly too complicated as such it. So the Cloud Analytics platform, although it's been good to have robust thing, I wouldn't say we wouldn't have necessarily needed it. But we have big data sets coming up shortly, especially sensitive ones, which will be perfect to then give it the full test. But we did end up building two ingestion platforms, one that put the so far drop SQL Server or whatever else into the prod. But we've also built a, the mini experimental zone uploader, which can do Excel files, all sorts of stuff straight to our database, which these guys can use. And the trick is, for them, just using it by default, you're slipping back into how am I going to do this, just load the thing? And that's all, it was the whole idea was build a board, where they could experiment and do things and then lift and shift the code. It's not going to be a lift and shift. It never was, but it's almost we know your source. We know your preferred target, then we can push it into this. But without the UKI, it makes life pretty simple for getting new data sources, even small ones gets big ones, gets hundreds of 1000s of files. It doesn't really matter. So it's not out,

Shane Gibson: I think that you showed us in the exhibitions, experimental zone and anything that we bought on tables get demolished after what, two weeks to wait for 15 days. So we have to be able to think smartly, how we create it so that in couple weeks time, we can then implement and get created again. So that sort of mentality has been really helpful in the way that we think about things right from the get go. One of the things I see people talking about is this idea that you can experiment and then lift and shift the code. And it never worked in my view, and my experience, but the idea that you can experiment and then refactor. And actually, if you experiment, somehow closer to the way the code needs to be refactored, then the refactorings less.

Hamish: Yes, that was how we saw it, lift and shift. But it was never going to be that but at least it's in the same technology. And it's, the same platform, you could take. If they built views, they'll more or less work and it tells you what you're trying to do and things. So it was always going to be close. But it was never going to be pure, lift and shift. And as we learned, there's so many ways to do things, it was just get the data in there, let them play and start something or even just small reference data as well, as always the problem how do you load that? That temporary table? How do you stall out and there was what we wanted to achieve early on. So we didn't have issues getting small things and all big things.

Liam: And I think the biggest thing as well, and this onboarding or doing some stuff with the scrum team, the data scientists was that we got to see how they are, what tools they used, how they actually ran a virtual environments and things like that. So what is very much a simple thing, like, knowing, how they worked. And knowing some of them, even just the tools to be able to analyze some of these data sets was actually, we have seen a lot of the value. So then, as a team, we're using all the same tools, and we're all working efficiently in the same way. So when we all run the same piece of code, it's all, for data scientists, we're doing everything relatively in our own environment, very much personalized. So what I'm looking at is around the sharing of stuff between the dev team, they were sharing stuff perfectly but as data scientists, we weren't quite sharing in the same way rather and efficiently.

Shane Gibson: And then probably also you started adopting some data ops behavior, check in code until ripples saving it on your local personal directory?

Liam: So these ad hoc analysis are now in repose, as opposed to some directory buried in the bottom of somewhere. So now obviously when we put things in repose, they have to have free, like they have things like that. So it's al little steps.

Shane Gibson: When you're not a Unicorn data scientists, you're just a human that actually wants to do good analytics with these tools and techniques. And then adoption is really interesting. So if I'm looking at DevOps, and data ops, kind of thing, what I'm seeing is a lot of AgileData and analytics teams now seem to be deploying their own infrastructure. Traditionally, it was NI’s job. And even as a data analysis team NI’s, they seem to do it without the traditional IES team. And I've seen that a lot. And one of the customers on Waking up at the moment, they have a very traditional release process, and infrastructure process. So even though they've gone clouds, they still believe that the IaaS infrastructure team should be doing all the, IES the creation of the scripts to stand up and burn down infrastructure. But I've worked with multiple teams now, where the data and analytics teams are actually doing that for both the code and their infrastructure. And I know that something that you did, so why was that? Why was it something that you built? Was it because you had to? You wanted to?

Hamish: Headed to, wanted to know what was going on. So we could be masters of our own destiny. I don't want to say don't trust anyone else, but we if we knew what was going on, and then we could shape it, how we want it and I guess the cloud platform we chose, when you look at it. We were probably the most educated people on what we needed to do and data analytics was maybe a speciality in it. It wasn't spinning up compute. It was all serverless, we were using as a service. And we knew what we wanted to do, but we just had to do it in a new way. So if we had to wait for someone else to do it, we'd be still waiting. So we just got on and did it, I guess. And we wanted to do it too, like learning new things. And it's worked well. And we've built a platform now. And within the platform, we've built our own little area. And when the time comes, we can hand the platform over to the responsible team, and then we can manage our little area. But by doing it that way, at least shape it as well and get what we want. So that was probably something you've taught us Shane, don't control what you can’t control. And if you have to rely on other people, you're in trouble.

Shane Gibson: You will be in trouble. But it's going to happen. In my experience. So self organizing means controlling your destiny. And the other thing for me is codes code. So as long as we're no longer buying hardware and shipping it from Singapore and racking it in a cupboard, in our office, and then getting our CDs out and putting them on and codes code. So as long as the platforms we choose our servers or code, our data's code, even though it's just code, and we treat it as such, then why not?

Hamish: Everything's in code, 98% of it's in a bill server, we've got a little bit that isn't. But apart from that, we've done pretty well, and we've had to refactor, we had to move regions as well, at one point, we moved from Australia to America. And we did that all in code, and it all works. So being able to make those decisions quickly and act on them has been what, we would not be where we are, if we had to have probably.

Shane Gibson: I know what the challenges are, we have with the stuff that we're building at the moment. And our startup, one of the challenges we have is, I think I stole one of your techniques, actually. And I started getting alerts from the cloud provider into my slack channel of what's changing. And so it's like a holy, every morning, I'm waking up and it's like, shiny, cool. But that's part of the value. And the problem, is actually figuring out how to manage that rate of change. Things that you go, damn build that last month, wish I had known versus let's make us go a lot faster. But probably not. This iteration is a challenge.

Hamish: There’s a lot of things that are in beta still to our own detriment, probably as well. So that's just how it is. I didn't realize how much it was but we need them. So we use them. And that's just decision we're taken and it's not. If we have to refactor, we will in this.

Shane Gibson: But you may not. I got a question for Liam, because this is one of my favorite ones. So I think i was replied to somebody. I cheekily replied to somebody's LinkedIn post the other day, when they were talking about the top analytical routines that most data scientists run. And I said to him, you forgot to mention number ones group by. So come back to the point you made slightly earlier, which was actually in terms of the maturity of what you're doing use machine learning and vision analytics, and natural languages are great and high value, but actually, at the beginning, using some data with some simple techniques to start that journey is actually probably what you're doing. Now, paraphrase, did I get it right?

Liam: Yes. They are great by counter nails all that sort of stuff. Obviously, understanding the data first. And obviously, it showed me a thing around about that pandas profile. Just initially looking at the data first, to see what's missing and what's not there. Obviously, that was obviously a big part of it. And there are all these little services as well that, you don't need to know a large amount of data science to be able to use some of Google's natural language stuff, but I guess being able to understand some of the bias between some of it and understanding where its pitfalls can be as well, some of the stuff that we've been investigating, but I guess they are great by accounts.

Shane Gibson: Start off with the easy stuff and then build up to the complexity, where you have value on it, but the buzzwords out there. Now the big data's did and map has gone bankrupt and been bought out. So the Hadoop, big data, Hadoop will died. But we're now on the world of AI, AI for everything.

Liam: So when possible, I tried to avoid the words machine learning and AI because they do out that are those buzzwords to get confused and misinterpreted, like some of the HR words. Try to avoid them and use detail when possible that’s the answers to the question.

Shane Gibson: It's a good answer. So all right, well, we might close it out here. So thank you and thanks for letting me record in the pub. I apologize to our listeners for the choice of music. It was not my, Hamish is and Liam's choice of background music. We'll catch you later. Thanks.