AgileBI – Raphael Branger
Join Shane and guest Raphael Branger as they discuss combining agile ways of working with the world of data and business intelligence (BI).
Read along you will
PODCAST INTRO: Welcome to the “AgileData” podcast where we talk about the merging of Agile and data ways of working in a simply magical way.
Shane Gibson: Welcome to the AgileData podcast. I’m Shane Gibson.
Raphael Branger: And I’m Raphael Branger.
Shane Gibson: I’ve been following you on LinkedIn for a long time, it’s typically hard for me to find people that are working in the world of agile and Data BI, so it’s good to finally have you on the podcast. The reason I asked you to come along is, you’re writing a book called AgileBI. So I thought it would be great to have a chat about, what’s going to be in the book, how you see agile and data blending together. But before we get into that, why don’t you give us a little bit of a background about yourself, so the listeners know how you got into this world of agile and data?
Raphael Branger: Yes, so thanks. It’s a great pleasure to be here. And thanks for inviting me. So I am in BI and data since nearly 20 years. So when I was 19, I started my career at IT logics, the same company I’m still working for. And back then, I started really, with creating reports, with a tool now a dinosaur called Crystal Reports, maybe you know that. And basically, from there, it took me everywhere to System Engineering, and then into bigger data warehousing projects as a developer. And suddenly, they say, I recognize that requirements, engineering is a crucial topic in business intelligence projects, of course. And that, especially coming from a waterfall world, in my, the first few years of practicing business intelligence, I noticed that there are several disadvantages in gathering requirements up front or before the project, because typically, as little as the at the beginning of, a project. And this was basically what we brought to Agile. Thinking about, how can we adapt agile methods from a software engineering world into the business intelligence world. And then we started to practice that first internally, at the company, IT logics where I work. And after a few internal learning cycles, we went out to first customer projects. And so basically, now it’s about 8 to 10 years, as we, working in this agile space. And this was basically the reason for us to decide that we want to write a book and consolidate all our findings until today. But it’s an immediate point what we know today, and we are still on a journey going on.
Shane Gibson: Well, that’s the edge I want it, inspect, and then experiment and then adapt. So if we ever get to a stage where we think we’re done. I got a funny feeling we’re not going to following the playbook. So that’s really interesting, because it’s almost the same as my journey. Slightly different in some ways. I started off with forests and trees, 30 years ago. But that whole idea of the time we take to get these requirements up front and the old way of working, and they were guesses at beast. And we spent too much time upfront. And then we had bugger all time to be able to change it at the end, let alone test that. And like you, the problem I was trying to solve was, I was frustrated in the data modelling the ATO back then. Phase of the development, that’s the one I found, if you had a person who was an expert, it was beautiful. It worked. They knew what they’re doing. They’ve done it for years. We’re pretty much always successful. But if you hadn’t anybody who hadn’t done it before, they’re relatively new, it was a nightmare. But I struggled to find any patents on that space at the time. So ended up focusing on requirements as well as being the first thing to work in an agile and same thing worked internally in my company, got some ideas and practice and then lucky enough to be able to experiment with a customer and took off from them. So, it’s a similar journey. So talking about requirements and requirements engineering, you’re scaring me there. So we’re going to go, there’s the whole hyper specialization happening in the market at the moment. Data engineer, analytics engineer, MO’s engineer, we don’t get a Requirements Engineer. But if we think about requirements as engineering, how does it work when you’re working with a customer?
Raphael Branger: So the term engineering was chosen very specifically, because he wanted to express, it’s not something which you can really learn. It’s something which follows certain rules. And this is what brought us to frame a category or defined framework where we said, we don’t want to think about what are typical requirements? Again, and again, and learning it by heart. So we started with having, four major areas, for example. So we start with asking people about requirements coming, we call it the environment. So what is the overall purpose of a project? For example, what are the business goals and processes to be supported? So really, what you typically would write into a project charter, for example, as well, some limitations you might have, legal aspects, I wouldn’t say it’s a requirement, because it’s nothing which someone inside of the project can choose to follow. It’s something which is imposed on to you before and so that’s the first thing, then the second thing we tackle is the question about what are requirements towards the organization and processes of you upcoming BI or analytic system? So this is the idea to already think about, who will operate the system, for example, or to what degree does a customer wants to be involved in the development process, because this is actually a requirement to say, we want to be involved in the in the process of developing the system. And as well having, the requirement to learn together with an external service provider as we are at IT logics. So these are the two first categories. And then we have the two others, which is about data requirements. And the other one is BI application requirements. So basically, what you see or what you get, as reports, etc. And we do actually vary depending, of course, on the overall scope of a project to then define or decide whether we need to tackle data requirements in what level of detail for example, because it’s a difference, of course, whether you are building a few new reports on top of an existing, as data warehouse, or whether you are building the data warehouse, and then maybe you don’t even yet know what reports or what concrete reports you want to build, after all. So that’s basically the four main categories we have in terms of requirements, but we don’t gather them upfront all. That at the beginning of the project, it’s a structure which we can reuse all the time. So of course, at the beginning of a project, probably the first two categories with the environment, processes and organization are more center stage. And the more we are getting into the development, the more detailed questions coming up in terms of what are the data requirements and to BI application requirements. And of course, there are many subcategories, as well to guide us in the project to ask the right questions, and also give some inspiration to organization, what actually could we want?
Shane Gibson: So do you find that you tend to start with a customer when they’re at the beginning of their journey? A Greenfields type environment where you’re working to do the data and the visualization or do you find it the moment you tend to be coming in, when the data is in place to a degree and then you’re focusing more on the Dallas last mile, the visualization, the dashboards, the BI stuff.
Raphael Branger: No, currently we are facing a lot of, we will call it data warehouse modernization situations, because we are now in the late age where, the typical first iteration or cycle systems which now maybe are between eight to 10 years old, are coming to an end of life and either, Because of hardware and software, deadlines where they would have to renew hardware and software licenses, that basically leads to cloud adoption, because a lot of our customers are now moving the BI stack into the cloud. And on the other hand, we see, especially in the small and medium sized businesses, entering a more enterprise business intelligence, way of working, I would say, they started out maybe with some individual reporting, and excess spread marking, and so on. And now seeing that their business is growing, and that they come to an upper limit of what is feasible for them in the manual, or more ad hoc way of working. And therefore, even though there are new passwords, all the time, like data mesh, I considered the last few weeks where we had a few discussions, but basically, at least for the average organization, it’s still the classical data warehouse, bringing together data from various sources and have pretty, standardized reports on top of it, do you have a good start?
Shane Gibson: One of the things we do and the IT domain, which we should be embarrassed about, as, new lamps for old, we take a technique or a pen that’s been around for years, and then we bring out a slightly new variation of technology and vendor watch it, to make it the new call. Or, even worse, we find a set of technology that has a really good fit and a specific use case, and then try and broadly apply it. And the one that gets my go the most on that is real time streaming of data. And dashboards, it’s required for some specific use cases. But to try and apply that pattern across every, information product or data product is ridiculous.
Raphael Branger: And one thought about the technology, even though it’s constantly evolving, and it helps to certainly solve some problems more elegantly or more quickly. In the end, when we have struggles in our projects, it’s typically not about the technology. It’s really about understanding your business domain, the business processes, how these processes are depicted in your source systems. And having there a lot of gaps between what people think, how the data looks like in the systems or how the processes are implemented into your systems, to what it is in reality. So this is where the agile fault comes into play once more, because you start with, an assumption at the beginning, both from a content or requirements perspective, as well from a solution architecture perspective. And you need to get access to the data as soon as possible to validate this hypothesis. And from there, it starts iterating. Because once you see the data, the first time you recognize all these differences in your assumptions, or compared to your assumption, and then you can narrow it down step by step, to the concrete result or what you actually then will deliver.
Shane Gibson: I think use the word way of working or the term way of working earlier, which I love. Because, at the beginning for me, I was focused on methodologies, right on strict frameworks and structures on, a set of tasks that were repeatable. And we could follow because I came from a waterfall background, that’s how we were meant to have done it. That’s how all the big consulting firms pretend to do. But after a while, I worked out that, for me, there’s a set of patterns that you apply. And those patterns may be technical patterns, around technology, but there are also patterns around ways of working, how do the teams work? And that’s dependent on how many people in the team and how many teams you’ve got. Also, that idea of a hypothesis I love, the number of times we’ve been told how the data behaves, or how the business processes executed. And the old days would believe it, and we’d start designing everything around, what we were told, then find out they’re urban legends. Nobody updates that field. That field doesn’t actually hold an identity for a person that happens to hold organization as well. Because, there was a change two years ago, and they [inaudible 14:43]. So that idea that start off with everything’s a hypothesis and you prove it as quickly as you can, and then they move on. I love that. That’s something and the third one that I picked up there was this idea around domain or domain knowledge and as you, I’m a little bit sick of new terms in window meshing and I agree data mesh will be the new big data. But under the covers data mesh has some good principles. Absolutely. The idea of being domain focused, having some domain expertise and the team makes the team better, it’s easier, quicker, faster for them, more fun for them to do the delivery.
Raphael Branger: But we see there is, an interesting I call it paradox, which we see in on large scale with a data mesh, and but as well on a micro scale in a regular, small scale or smaller scaled element, namely, that the more flexibility or agility you want to achieve, the more you have to or have a standardization covenants on it. And I think that’s very interesting. To understand that, sometimes you need to first limit the options you have, in order to achieve higher flexibility in the area, you actually want to have it. And that’s basically what I took away from the whole data mesh discussion, you’ll still need some kind of covenants to coordinate the different, elements in the data mesh, in order to make it somehow compatible to each other.
Shane Gibson: In my experience, when I get invited in to help a team, start adopting an agile way of working with data, sometimes there’s an expectation that the team are going to be more efficient, faster, deliver more on day one, and we have the conversation really earlier about actually we’re doing complete change and change has implications. And the implications are when you adopt this new way of working is, you’re slower at the beginning. You fail more often and you learn from it. Do you find that? Do you find that people expect agile to make the data process better on day one, or do you find most people are open to the idea that actually they’ve got some learning to do, it’ll take a while before the team gets to a level of maturity that they rock in?
Raphael Branger: What helps me here is to distinguish between, or I use the analogy to, for example, a car factory. So in the end, you want to get out 1000s of cars per week, for example. And even with a high variability of things, which you can choose from etc. But until we get to that stage, that you can build so many cars each week, you need to invest quite a bit into your factory, and build up the factory. And there is some initial effort. And that’s something what I learned or, which is one lesson learned from the past ten years, there is no, red button to increase agility. And then you push it and then agility is here. So basically, it’s a quite challenging process, which starts with professionalization. That means you need to have people working in a professional way. So they need to know data modelling or certain skills, not only technical, but about requirements engineering. So it’s about bringing together a good team, which has a certain level in what they do. And that’s the start. And of course, you can have an iterative process like Scrum, with Sprint, etc. If you don’t have the good people in the team, then you are failing faster, but never succeeding. And the other thing, what you see there, if you start working with professional people, then something follows, which I call professional laziness, because you don’t want to reinvent the wheel all the time. So you’re looking for standardization, which is the next step in on your way to agility. And if you start to standardize things from design standards, from in your data model to technical patterns, but as well for the requirements engineering process, this is why we use a framework, which is always the same, it’s always the same categories. In order to go through Week, we have checklists and templates, etc. So you’re faster with standardizing things or when working with a standardize elements in your project. And then a third step comes into play, when you have standardized elements in your project, you can start to automate them. And only automation brings you there that you can deliver. Or that you can work in shorter iterations. And with short, I really mean one week, two week at the maximum, where we can build what we call vertical increments or end to end increments. Only if you have these short iterations, you actually have a chance to deliver more frequently, every four weeks, every six weeks and so on. And this is this journey where we say, building up this professional way of working, establishing certain standards, implementing automation patterns. This is building the factory. And for smaller companies, or smaller projects and small scope, simpler data sources, etc. It’s not that a big thing to build such a factory, because again, you can do it in the same way all the time. Because the tools are there, nowadays, etc. And a lot of standards are there, anyway, as well. But of course, if you have a very large organization, building, the factory can be more challenging or more time consuming, think about if you want to have fully automated deployment, pipelines, continuous integration, and so on. So you can invest quite a bit of things if you want into your factory. But the more you invest into the factory, the more powerful are the outputs, you can create in even much faster time.
Shane Gibson: We’ve taken that concept of automation in the data world and include the Dev ops or Data ops, it a new word. But that idea of factorization is one that intrigues me. So if I look at the way a data team typically works, for me, a flow based model or flow based pattern from agile tends to fit better. But what I find when I start working with a new team, is if we start off with some flow patterns, they look so close to the pipelining and handoff process that they normally use, that we end up with a BI that does requirements, you hands it over to a module model. So I find they don’t iterate as much, that there’s not as much change, and therefore they become professionally lazy, and not in a bad way, but it’s comfortable for them. So what I find is, if we move to a scrum pattern, if we break things up into iterations, and that change in the way they work is actually enough to trigger a massive amount of iteration and terms of the focus on that way of working. And then they tend to find ways of automation. And when they automate, they become back into that flow factory process. Is that what you see?
Raphael Branger: To a certain degree, yes, it’s true that we typically start with an iteration based process as well, because it gives you more safety in the beginning, in a waterfall world, having a plan and thoroughly thought through concept in the beginning, gives you safety as well. And you can be assured that you actually thought about everything, and you can simply start developing. Of course, the issue is that the requirements might change while you are developing, and that you misunderstood certain requirements, but except of this little detail, you are in a quite safe world, because you have really a good concept. Now in the agile world, where you do not have that much upfront design. It’s much more scary, I would say, on an emotional level, but as well on an objective thing that you always need to think, can we change this now without still being sustainable or producing a sustainable solution, because that’s, of course, a risk you are facing that if you continuously evolve your data model, that you end up in a mess, and not a consistent data model, etc. And here, having the iterations, especially with iteration rituals, as iteration planning, and then a dedicated iteration review. So this gives not only the developers but also stakeholders, which probably are new to agile, a certain safety or feeling of safety, so that they see what’s going on, they have a fixed schedule, etc. So it’s much easier for them. And for us, then, typically, in the first week, we call it a release project consists or it’s typically a release, which starts with some preparation work. So again, agile doesn’t mean that you jump into development straight ahead. So you do some minimal upfront design, and then you go into the development phase, or we call it construction phase. And then we have a hand over phase to bring something into operation which we usually call the transition phase. And so this first release, typically between two to three months. There is nothing yet in production, especially if it’s a Greenfield approach for a new system and their iteration based working is absolutely fine. Once we are getting close to production or have the first release, we are moving slightly away from them not mainly the iteration written. But we are more flexible in terms of the planning, I don’t expect a team then commit to what they want to achieve in the next two weeks in a very dogmatic way, because we cannot predict how many buck reports we will get or some other operational things. Because typically, as you mentioned it before the DevOps or data ops thing to developers, as well do the operation of the system, at least in the beginning, typically, and especially in this situation, where you must be more flexible to react, the flow based process, it’s more convenient. But it as well need certain discipline, and a maturity level, for example of a product owner to constantly really writing their stories or keeping the backlog in shape. And if you don’t have these rituals of extract, now I have the deadline of the iteration workshop. After tomorrow, then did the flow Bay thing can be dangerous for new teams especially.
Shane Gibson: So for me, it’s about level maturity, as the team get more confident, they’ve started to build on some of that automation that makes them more efficient and stuff safer, then change the way you work to be more flow based, if it fits you. And same thing as is originally, we effectively had a development team and a BIU team for squads, or whatever you want to call them. Because those are what tended to happen in the waterfall organizations. And there’s a whole lot of unplanned features that came through, how to the BIU team pick up and fix them, or go back to the development team, by the development team or in the next iteration, and they are focusing on a new domain. So this data ops idea or DevOps idea of, you build, you release it, you maintain it, I think it’s a much better model, right. But it brings a whole lot more uncertainty in the planning, because, there’s always smaller bits of work, they’re going to turn up out of a typical iteration cycle, and the stakeholders don’t want to wait, two weeks, three weeks, four weeks for that small change. So on that, I’ve experimented with customers on size of iteration, time period, I found that three weeks tends to be the best way of starting, so a three week duration. And, day one, it’s really unlikely that the team will be able to go from ideation of, we need a new information product to production in three weeks, they end up pipelining. So they, break it down as we should or notate multiple iterations before it’s done. But they’re striving to get that cycle time down into three weeks where possible. But everybody in the world always uses a two week iteration. Agile was scrum. Agile is said two weeks iteration, there’s now the de facto standard. I’ve tried for, what I found is that last week was waste for some reason. The last week, the team tended to waste their week for two reasons. One was a little bit too fast. We did one day for an experiment, one day for a laugh. And that was funny. But, getting data and transforming it and visualizing in a day with a team is pretty hard. But what do you find? Do you find that there’s typical iteration lengths, you start off with.
Raphael Branger: My, default is actually two weeks as well. But I have to distinguish here, whether the factory is in place already. If the factory is in place, I tend to work in one way, in one week iterations. So that’s challenging, but if you have really automated the main processes from getting data into a persistent stage and then from there, into a Core Data Warehouse model. And if you really can focus on the actual requirements on to the modeling because that’s what you can’t automate. But then it’s absolutely feasible. The main challenge we face is the availability of the product owner. So typically, we go to two weeks because the product owner in typical organization of our customer base, does not have a full position or the full, FTE for the project. They are typically available for two days a week and that’s why we then try to find out a good combination of what is their availability. And then look, what is the amount of development capacity, which makes sense. And then it’s as well, experimenting. Once again, sometimes we start with a one week iteration, and then we find, it’s too sporty, we better do it in two weeks. In some other cases, we had to took a decision as a team to go temporarily to a three week cycle. But it really depends. And I really encourage you, as well, our listeners to really experiment and to not follow some strict guideline in some way. But really experiment and then take a decision, what did work best. And what works best today is perhaps not what works best in three months. So it really depends as well on the situation.
Shane Gibson: Also, when you’re working on the iterative model, or an iterative pattern, they talk about it being in Scrum teams, we use the term sprint, but for me, it’s a marathon because it’s continuous, it’s constant. There is no downtime on that team. So for me mixing it up with one day or, one week iteration, which is a research one or, a fidxed or one with a team of focusing on something that’s annoying them about the platform or the automation, and they’re focusing on fixing it, changing their cadence from their standard iteration cycle to something different, again, gives them a change, and that change is treated as a break. So when we go into Greenfields, we’ve got to build the platform, because as much as people talk about SaaS, Cloud Data Warehouses, the reality at the moment is we have some great cloud analytical databases. But they’re not a data warehouse, they don’t have everything we need. And so especially with the current modern data platform paradigm, we’ve got to go and grab a bunch of technologies and cobble them together, to give us the capability we want. And that takes time and effort and expertise. So again, in the original days, we used to look at iteration zeros, we used to look at a bunch of iterations that were platform builds. And what we found was, we weren’t particularly good at guessing. So when you want some of the core features, you typically need some data storage, you need some way of running code, you need to transform the data, you need to visualize it, that was a no brainer. But you have the details, the tricky stuff in the middle, was where the real gold was, and so moved more to building the plane as you’re flying, which is hard. But what it means for me is, you’re only building something when you need it. And so then you’re testing it straightaway. To prove that it’s given you the value one downside, it’s hard to build something when you actually need it on the day. And it makes it longer, it’s perceived as longer to get information product out the door, because he’s all that platform engineering that needs to happen. What about you? Are you still seeing a lot of iteration zeros? Or are you seeing organizations now building it as they need it?
Raphael Branger: Again, it’s both we, I would say I would go with the 80, 20 rule. So we invest, 20 time, or 20% of the time of our inception or iteration zero, trying to build 8% of the needed platform functionality. This we can do typically, with a combination of data warehouse automation tool sets. So for example, the originally westquay, New Zealand based tool from westquay. And they already bring you quite a bit of basic form, or foundational functionality, which you need everywhere or again and again, and as well provides you with the basic pattern for various platforms, etc. So that’s the first thing now what we did, as a company or service provider, we build on top certain patterns, which helps us to start very quickly with a customer. For example, we have a pattern for creating such a persistent staging area, because it’s always the same thing. It’s getting the data from a source system, model wise in a one to one fashion, maybe a few data type mappings, and adding history station shows to having an archive we want to have physical delete detection. This is always the same for whatever table you take. Now given the fact or if we stay in the realm of table based data. And so if we already have this, after half of a day of configuration, then of course, you can already start in the second half of this first day, loading the first data. And of course, there are some troubles connecting to the data source. But this is where you actually solve concrete issue or problem, but loading the data into this persistent stage, for example. That’s it. And then we bring along a few patterns to generate the whole data load for example, some dimensional or fact tables, where we need to build in the transformation rules to map the data from the source or the PSA system, and map it to the fields you want to have in the core system. So you can already start modeling from day one, because that’s part of your requirements analysis. And this is where we work with the beam methodology of Lawrence Corp, to really collaboratively working together with business people and creating the data model. And having now already the machinery in place that we can generate the structural elements in the Data Warehouse in day one. The only piece which is missing is really again, the mapping between source and target. And this can be usually done in a few days. So having this play in automation already in place, both from, platform perspective, and the data warehouse, to provider, together with some predefined content, then you can actually achieve this, speed. And of course, if you don’t have this, then maybe you will invest one or two weeks to build this basic patterns on demand or based on what your current need is.
Shane Gibson: As I said, in the beginning, finding people that apply agile and data, the giver is difficult for me. There’s lots of people that do Agile. And one of the things I look for is this idea of an AgileData coach. And for me, it’s somebody who brings patterns that may be reusable with them. So you’ve named a bunch of patterns that are core to a data team, when they’re working. And I wondered what’s going on with your Lawrence core and the beam approach, that for me, is a pattern that most data teams should adopt for understanding the core business processes and the data that supports BI, it’s a pattern that’s worked for me for 10 years, and I love it. It’s fun as well. Anybody who hasn’t used it, go use it. So, risk of starting a religious war. The other pattern that everybody tends to argue about is the modeling pet. Which modeling paradigm, do you tend to see customers using the most and which pattern do you tend to bring with you the most?
Raphael Branger: Now, I’m a bit biased, because we are proposing dimensional modeling to our customers. And therefore, we do not have that many data world customers, because we are not that famous for having this or having the expertise in this model. One reason for that is we are still convinced that tackling the BI challenge from a business perspective, which is tangible for people, and which is basically understandable, is the most biggest benefit you can achieve, so that you have an understandable data model that which is understandable by business people. And in most cases we are working for and again, this depends again on the size of the company and the number of source systems. But the typical company we are serving, where we are talking about maybe between 5 to 10 source systems, typically in one, platform, then a dimensional model is definitely a good fit where you don’t need the additional layer of data. And we as well try to mitigate certain elements or certain risks, which come along with a dimension model by always providing a persistent staging area as an archive, where you can always reload the core if it’s absolutely necessary. Of course, we try to omit that, but we have this safety net. And now the question was again, why was this possible to do so? Because earlier on we it was too much effort to build data flows into persistent stage all the time for all the tables. Because you would have to create this manually for every table. Now as we have all the patterns in place, it doesn’t matter where we have 5 tables, 50 tables or 500k. Well, you drag them into the tool and say, now generate, and that’s it. And then you go and grab a coffee and wait until it’s the generation process and the deployment process, everything can be automated, as long as there is no business logic there. Again, this agility as well brings us this flexibility that we can really work with a model, which makes sense from a business perspective, even if it has some downsides from a technical perspective. But the overall package definitely works fine.
Shane Gibson: I’m more of a Data Vault baguette than the dimensional one. But I take your point on complexity, when you would use Data Vault, you’d never expose the data vault models, to the end consumers, you always need a consume layer or presentation or whatever you want to call it. And that may be dimensional, maybe denormalized, because under the covers, the model was horrendous to look at, but incredibly easy to automate. And I’m moving on PSA, on what we call a history layer. That’s a bit of a heretic and the data vault will, because in theory, you’re meant to use the vault modeling technique for a royal layer. But we don’t, we recommend using a persistent staging, which is mirroring the source system historicizing the tables, for the same reason, you can reload from scratch, which you can do with a vault. But it’s a pattern that’s worked for me a lot. And so therefore, it’s a pattern that, we tend not to change, as your experience in dimensional works, and therefore you get the value out of it. And so use the pattern that work for you.
Raphael Branger: One thing to add there is it’s really depending on the customer organization, whether they have the people with the necessary skills to operate and understand and maintain a Data Vault. And this is where we see, especially for very large organizations, typically that works or they have enough as well internal stuff to do so. But for a bit smaller companies and where BI is mainly business driven, you are getting to some limitations in terms of, understanding, highly normalized model, and data vaults.
Shane Gibson: Whenever these divisions and patents, there’s always very strong opinions. I remember the old Edmund Kimball arguments of many years ago. So it’s up to the organization and the teams we’re coaching, for them to pick the patterns, what we do is we encourage them, where we’ve seen value and ways of working from other teams, it’s something we bring. I have had teams that actually don’t model. Now, I don’t recommend it. My recommendation, based on my experience is always pick a modeling technique, there is value in that. But I have had teams that the way they work, they don’t model, they are more of deploy and destroy. And that’s the paradigm and it works for them, but I don’t see it often. And it has to be a conscious choice. The team have had to have experimented. And now it works, rather than not modeling because it seems easier. So when we talk about those, that factory and that automation, and we’ve got the framework or the pattern for loading new data sources into the persistent stage. So, we point out, we do the horrible roundabout of how do we connect, don’t have credentials. But we finally get access to the beast of a source system, we start grabbing the data over, we’ve identified the keys on the table, we’re rocking, persistent, stage or area as hydrated. If we’re moving into a new domain, what I find is the team come in, and they’re the Beckett day one because and this is a domain expert on the team. They don’t understand the core business processes. They don’t understand the source data. They don’t understand the profiler. They’ve got a whole lot of work right before they can start applying the normal patterns. And there’s a real temptation to do that work early. To pipeline it, where another team does all that work and hands it over. And so there’s this balance between the problems we have where a team walk into iteration planning on day one, it’s Monday, and they have no background information, and they’re trying to plan their work. So their rate of guessing, they’re guessing with a whole lot of uncertainty. So there’s got to one end of the spectrum. And there’s the other end of the spectrum where, BI goes away and does a whole lot of analysis and insight, doesn’t write up a requirements document. But all that knowledge is in one person. And that works be done up front. But it’s not been done by the team. So it’s not as consumable. And so for me, it’s hybrid model, balance between those that worked out up front. And the team turning up with no insight, is something that we focus on a lot. And it’s a difficult problem. How do you deal with it?
Raphael Branger: So you discuss, so called inception phase with some minimalistic upfront design comes into play, because we often have the situation that we really start from scratch or on the Greenfield, not even the customer knows exactly what they want, and how things work. And usually, it doesn’t take that much time to get some first insights and do some first experiments. So typically, maybe two to three days of hackathon, where you look at, first of all the goals, you want to achieve, then having some sort of system guys in place, where you can have a direct looked into this tables, etc. Looking, what’s there asking questions, doing some modeling canvas, work together. So basically, it doesn’t take that much until you get the first idea, how things works, and how things relate to each other. And one important thing here is establishing a common language. So modeling for me, it’s not only having a data model in the end, but modeling is certainly defining this common vocabulary, so that everybody understands what a customer is, what is a product, what is service, whatever entity you have in your organization are finally done in the data model, you can solve a lot of things within only two to three days. And it’s worth taking this effort to, get a new team accustomed to each other, but as well to the domain and the topic they should work in, and then establish the plan after having these first few workshops, then start with maybe, if it’s really sophisticated new business domain, why not doing a proof of concept or a little pilot, where you don’t need even to build the fully fledged data warehouse, what we often do is we build the persistence stage with a few tables, because again, automation, assume is already there. And then we do either a view layer or go directly into for example, Power BI, and then to some virtual data modeling, where we have a look, how could the skeleton look like and then already show some first data, it’s not about having a report, which is production ready, but that you can show the first concrete results even after the first three to four days. And this is typically as well, where you can then identify some misunderstandings in terms of, I didn’t mean it like this from a customer perspective, or a business perspective. And so that’s a good step in between that you do this ritual data model, where you have a first idea of how this could look like. And once you have figured out the main issues during this experiment, then you can tackle these issues already in your backlog, for example, we pretend we have certain data quality issues. So you already know, here we need to plan a little bit more time during the sprint or an upcoming sprint to address them. Things like this.
Shane Gibson: Exactly same technique that I patent I recommend. So the idea of a research spike, the idea of time boxing it and again, really important to time box that because as data people we love to go in the detail. So when the team find out they’ll need to match customer recruits from multiple systems. That’s enough statement to know this complexity, but they always want to deep dive into how’s the matching algorithm going to work. I’m a great fan of canvases, I find that canvases are a good way of getting enough information quickly that the team can look at it and not to give it, they get understanding, but they’re not going to write a whole lot of detail because at this time it has no value or minimal value. I definitely agree that the best way of getting feedbacks put something in front of the customer. So quick prototypes and front but being very clear that they can never go to production. They don’t meet our definition of done right. So as professionals we will not push it to production because we have standards, we hold ourselves too, and heck in a dashboard and approach type of making it live for 1000 users is not what we do as professionals. So two things I want to cover off before we end as a quick conversation on scaling, which is never a quick conversation, because it’s one of the hardest problems. And a second conversation around the state of the BI in data tooling market for agile ways of working safely. So first the scaling one. So what I find is working with a small team is not easy, but it’s easier. We, they’re dedicated, they’re focused, they’re iterating, their ways of working, they’re self-contained, self-organizing, the lines of communication are small enough, they tend to rock it right. And it’s amazing to work. As soon as we scale to, 10, 20, 30, 40 people, we now have to break things up, we know that a team of 40 working on the same thing is not efficient. So we need to scale them right. And we may scale them on a domain. Heavens forbid we scale them on the part of the process, because not my favorite pattern. We may do it, where we scale them that is pick up the next thing off the backlog. But now we want to reuse patents, we want to set standards across them and have a light federated governance model, those kinds of things. So what’s your view? I mean, have you got any magic tips for when you want to scale from a team of five to a team of 30.
Raphael Branger: It’s not magic, but the first thing I do is, I typically split those who built the factory, and those who use the factory. Because, as well, from a skilling perspective, those who build the factory typically need to have much more insights, in terms of what standards they want to see implemented, they need to maybe think a bit more in a more abstract way, because they need always to find a generic solution for which then can be turned back again, into concrete solutions. So that’s the first thing and depending on how big your factory should become, you can put in quite a lot of people into the factory, which is much more similar to working on the software engineering project, even though it’s still in the data world, etc. And then again, the factory is a highly important aspect, in order to then have multiple teams working on different domains, but always using the same patterns. And you can’t achieve without having factory powered by automation, because the automation again, as I told before, it’s always based on standardization, because otherwise you can’t automate. And that’s again, the foundation. Once you have this machinery in place, it’s much easier to have multiple teams on top, working in always the same way because they don’t have to care about naming conventions, they don’t have to care about how certain code is implemented, they can purely focus on the business requirements, the modeling, and transformation logic to map the source data to the target model. So that’s my main, direction I would use in order to scale up the teams. The other angle we could think of is when we talked about is end to end increments. It’s always a question, what does end to end mean? And typically certain authors in the literature day, they always say, you always have to go from the Soul system up to the fully fledged dashboard, etc. And only this is AgileBI. And then, of course, there are other authors which say, no, everything needs to be splitted. So nearly not every layer in the data warehouse, but perhaps the persistence stage is one layer and this is a user story. And then you do a use story for your core layer, etc. And we think that it’s somewhere in between, we typically go with till the core warehouse, where we say the core warehouse is independent of the main business complex. Of course, there is some business logic for example, when it comes to data integration of various source systems, but usually on the level of a data mart and then the final reporting, this is typically driven by a specific use case. And this is as well where we could split teams, one team is responsible for really building the data foundation and do the stuff which is rather generic in the context of the organization and then you have individual, either project or use case driven teams, which are then working on specific soft business rules and the implementation of the data model and reporting layer, You can as well scale based on the separation of resources.
Shane Gibson: I agree with you, scaling is hard, pick a pattern, experiment with it, get to where it doesn’t work. A lot of the time, the teams I’ve worked with, we have moved to what I call platform as a product. So we have a squad or a team that are focused on the platform, we tend to introduce the idea of Data Platform Manager, whose role is to engage with the other teams or squads to understand what’s coming. And so they can serve them as the customer. One downside I found with that is some of the fun stuff, some of the platform engineering, some of that hardcore automation, then gets taken out of the squad and given to another team that tend to do all the cool stuff. And, the squads end up being more factory of, move the data and visit. So one of the teams I worked with, they experimented with the squad that the squads doing the initial cut of piece of automation as part of their information product delivery. So they solve their own problem with some help from the platform squad. So we might have parachuted a platform squad member into that team for a while to help them. And then once it had some value, then it got moved to the platform squad to automate it and make it available as a product to the other teams. So they could use it. And as you said, professional lazyism, Spotify talked about, use different terms, but the same model, which was when something’s repeatable and available and automated, a team will typically pick it up and use it because it’s easy for them. So you don’t need to force them, you need to encourage them. So, then you can do it via domain, you can do it via stage of the development cycle. But the key thing is experiment, and adapt. So if it’s not working for you, change the model. So but scaling is hard. So last one, state of the data, the tooling market to support agile ways of working, what’s your view on it at the moment?
Raphael Branger: Definitely better than 10 years ago. So when we started 10 years ago, I read about things as data warehouse automation, definitely there were some vendors back then. But it was still on a rather rudimentary basis, I would say, nowadays, we have quite a vivid market of tool vendors in the area of automation, what I prefer having their tools and, of course, again, where cape, where you have still a certain flexibility, where you can, influence what code is generated. And even though out of the box stuff is great to start very quickly, you will always end up with certain, customer organization specific things which you need to adapt. And this is something where I would look for, that you don’t lose the flexibility to control what actually generated to be not that much dependent on specific one. The second thing I would like to mention our data or data warehouse specific testing tools, which is something which isn’t on the market that long, I would say, or at least there I see as well, Interesting development. So that you can first of all automated test cases or running the test cases. And what’s as well, a very interesting combination is when you manage to combine the data warehouse automation tool with the test automation tool, so that the test automation can, use the metadata of westquay, to derive what tables to be tested, having the full lineage, for example to know, if this is my PSA table, to source table in the source system is, this and this table, for example. And this makes it very powerful in terms of as well making sure that the data quality is there and that you don’t have regression issues. And the third thing I would like to mention is that agility or this tooling stuff is not only there in the data back end, as well, they’re in the front because the same is true in terms of standardization. So maybe you have heard of the IBC is the international business communication standards. And using such, we call it a notation standard, where you define always somehow predefined how chart should look like, how tables should look like. Then there are as well tool providers typically add-ins for existing tools as Power BI, so that you can apply this standard automatically or pretty much out of the box. And this again speeds up the whole process from requirements engineering, because you don’t need to redisuss, would I like to have now this chart in blue, yellow, orange or whatever color, it’s predefined to a certain degree. And as well, the development is very quick. Because when I used to invest multiple days, if not weeks to create a fancy chart, you get much more out of the box, again, based on the standard. And I think that’s why I said it’s much better than 10 years ago, through the whole stack from really getting the data and testing the data. We have a lot more of automation aspects available nowadays. And of course, as well, the whole deployment, pipelines, DevOps tools. So we did too good to get good progress. But we are still behind the software industry or software engineering world. They are still, 10 years before us.
Shane Gibson: Yes, definitely. Especially when we talk about testing, automated testing of data so far was getting better. But we see waves and patterns and the technology space. And the first tool, you use crystal reporting got bought out and consumed into a big beast and being completely lost. We’re going to see a wave now, of all the other little niche tools that add value to our lives getting merged into big behemoths until the next wave goes. So to close out, I started my journey reading a couple of books, I read some of the AgileData or BI books from ‘Ralph Hughes and Ken Collier’. And they really illuminated my journey for me. And then, there were a bunch of other books that came out, ‘Lawrence’s book around beam, ‘Hans’, this book around data vaults were more specialized, rather they were dealing with one small part. And, yes, they use the word agile, but they weren’t really talking about the end to end process and how it fitted, they were talking about some really good patterns that you could use on there. And then we had a gap right there, there’s not a lot of came out. And then we got a bunch of what I would call whitewash books. We got a bunch of books that had the word agile on it, and data, but really, they were technology books, it taught step by step how to implement, SQL Server Analysis Services, or a bunch of tools and technologies. And they were valuable if you hadn’t done your platform build before, but they’re all focused on technology, not ways of working. So you’re writing a book, I know, books take a while to write. But obviously, based on what we’re talking today, I’m assuming your book is going to be about the intuitive process, about that way of working on patents you can leverage. So Is that true, and when can I start reading it?
Raphael Branger: You can already start reading it now. Because we are of course, writing the book in an agile way. So we are doing iterations, chapter by chapter. And we are publishing, better releases of the book. So maybe we can post the link on the show notes. But you can easily register there and become a better reader. So currently, we have the first six chapters already available, then you can as well give feedback if you want, what you would like to see in addition or where something didn’t make sense for you, for example. So that’s the current process. So we plan to have a first draft ready this spring somewhere and then hopefully, the book can be published somewhere in 2022 already. Now in terms of how do we want to be different to already existing books is basically first of all, we are addressing non IT people in that book. So because our main stakeholders, we are currently working with are typically controllers for example, or marketing analysts, etc, really business people. And there we see that they don’t want to read too much of a technical book, they are more interested in the overall process, for example, and that’s why the book is designed as a city guide. So we depicted the city called AgileBI City, and we are taking our readers onto a sightseeing tour through the city, where we show them around various districts. We start for example with a district is called Inception beach and Inception beach. Each learn how you do this minimal upfront design. Then we go further to Wishon Hills. So this is more about, what role does a BI strategy play, to enable agility in BI? Then we go to website requirements, where it’s all about gathering requirements, then we go to I guess we have contracting files. So how about agile contracting downtown patterns? Then we go to technology land, or a district called technology land. So we are really walking the reader through the various steps, the various I would say, building blocks you typically need. And of course, you do not have to visit and then adapt this into your home, in your own company to have all the different areas. But I think it gives you a good overview, what are the elements you need to think of, and then adapt accordingly to your own situation?
Shane Gibson: That’s cool. I love the way that it’s giving the information by telling a story in a different way. One of the books I did enjoy, that came out around agile, but not around data was the Phoenix Project. And I enjoyed it because it told a good story. And then, as part of the telling that good story, you learned something. So that idea of a city guy. That’s, cool. That’s something I haven’t seen before. So I look forward to reading it this year. So before we close out anything else, top of mind for you about agile and BI and data that you want to talk about.
Raphael Branger: For me, again, and again, the most important part, show the value as early as possible. Even though we talked about building a factory, etc. Don’t invest too much into the factory before showing some concrete value, maybe having a simple report, or a quick win, where you can show the value to your stakeholders, That’s the most important thing, once they know that there is value in their data, then you can start focusing on building the necessary foundation to have a really sustainable solution, etc. But don’t forget about the value to show as well and make this value visible to your stakeholders.
Shane Gibson: I can’t agree more. So, thanks for your time. It’s been an awesome talk. We’ve covered a lot of bullet points, but they were all things that people should think about when they’re doing AgileBI or AgileData with your team. So thank you for your time again, and we’ll catch you later.
Raphael Branger: Yes, thanks for having me.
PODCAST OUTRO: Data magicians was another AgileData podcast. If you’d like to learn more on applying an Agile way of working to your data and analytics, head over to Agiledata.io.