Agile DataOps
TD:LR
Agile DataOps is where we combine the processes and technologies from DataOps with a new agile way of working, to reduce the time taken and increase the value of the data we provide to our customers
What’s in a name
Everything it seems.
I remember years ago working with a financial organisation (ok it was a bank) and the team were tasked with building out a data mart that held customer data to support a digital channel initiative.
One of the key things we had to provide was a flag for customers that were active. Even in those days I was using our “who does what” discovery approach and so we had already identified the concept of a customer and its associated rules.
But when we got to defining the rule for Active Customers we hit a problem, the Marketing, Finance and Risk teams all had different definitions for active customers. For marketing it was pretty much if you had an account and weren’t dead you were an active customer. For finance (from memory) it was if you had made an active transaction in an account in the last 3 months. And I can’t remember the rule from the Risk team, but I do remember it was based on the regulatory rules provided by the Reserve Bank here in New Zealand.
We had a couple of goes at facilitating a common rule for Active Customer, but quite rightly each team needed their own rule to be applied to meet their business outcomes.
So in the end we created three flags, Active Marketing Customer, Active Finance Customer and Active Risk Customer. We then suggested the digital channel team picked the flag that met their business outcome the best.
So that leads me on to the subject of DataOps
xxxxOps
We are seeing a plethora of people adopting the DevOps approach to the things they do and therefore the rise of xxxxOps. For example SecOps, DevSecOps, MLOps and of course DataOps.
I recently saw a slide from a presentation Shaun McGirr did describing MLOps. It said:
The process and tools that allow organizations to safely scale the use of machine learning in production environments”
My immediate reaction was where is the people part in that definition, its missing.
And that is because when we talk about DataOps we talk about it being more than just DevOps for data.
DataKitchen is one of the most vocal companies on the new world of DataOps. In their article DataOps is NOT Just DevOps for Data, they state:
One common misconception about DataOps is that it is just DevOps applied to data analytics. While a little semantically misleading, the name “DataOps” has one positive attribute. It communicates that data analytics can achieve what software development attained with DevOps
For DataOps to be effective, it must manage collaboration and innovation. To this end, DataOps introduces Agile Development into data analytics so that data teams and users work together more efficiently and effectively.
While the name “DataOps” implies that it borrows most heavily from DevOps, it is all three of these methodologies — Agile, DevOps and statistical process control — that comprise the intellectual heritage of DataOps
Jon Loyens from Data.World also has an interesting article on the subject, What is DataOps? In this article he states:
Data is a team sport. Keep those channels open. Agility and transparency are key as your and your team continues to integrate DataOps strategies into your day-to-day work
Both these lines of thinking aligns with my experience, introducing an agile way of working is key when moving to a DataOps paradigm. Adopting patterns and approaches from our DevOps cousins is important, but not the only things data and analytics teams need to adopt.
Many many many definitions
If you google DevOps you will tend to find a raft of fairly consistent definitions, something along the lines of:
DevOps is a set of practices that combines software development (Dev) and IT operations (Ops). It aims to shorten the systems development life cycle and provide continuous delivery with high software quality
But if you google definitions of DataOps you will find a raft of varying definitions including these:
DataOps is a data management method that emphasizes communication, collaboration, integration, automation and measurement of cooperation between data engineers, data scientists and other data professionals
It is widely recognised that Andy Palmer was the first to coin the term DataOPs in his 2015 Tamr article From DevOps to DataOps
Ok lets see some other definitions:
DataOps, the set of best practices that improve coordination between data science and operations
DataOps is a new way of managing data that promotes communication between, and integration of, formerly siloed data, teams, and systems. It takes advantage of process change, organizational realignment, and technology to facilitate relationships between everyone who handles data: developers, data engineers, data scientists, analysts, and business users. DataOps closely connects the people who collect and prepare the data, those who analyze the data, and those who put the findings from those analyses to good business use
DataOps: an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics
And it wouldn’t be right to have quotes without a couple from the analysts:
DataOps is a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and consumers across an organization
DataOps is an integrated approach for delivering data analytic solutions that uses automation, testing, orchestration, collaborative development, containerization, and continuous monitoring to continuously accelerate output and improve quality
PHEW! thats confusing
When there are lots of different definitions for the same thing, just like the banking teams I mentioned earlier, each person or group will tend to find the definition that resonates with them the most, or is the one that will help them achieve their desired business outcomes.
To date the definition I tended to use the most is from the team at DataKitchen:
DataOps combines agile development, DevOps and statistical process controls and applies them to data analytics
And this is because like me they believe we need more than just better Development and Operations processes to make the adoption of DataOps successful.
To me DataOps is the combination of a bunch of proven patterns and approaches which are adopted by a data and analytics team, and these include patterns and approaches from:
-
-
- DevOps
- Scrum
- Lean
- Data Science
- Data warehousing
- Statistics
- And many many more
-
A thing by any other name ….
I think the trap we have fallen into is the same one as my active customer example. We are trying to use one term to describe multiple things, each which are subtly different.
So from now on I am going to make a few changes to the way I describe things.
I am going to start by referring to DataOps as DevOps for data. Just like Shauns slide which describes MLOps in a way that matches the DevOps definition, it makes the definition clear and unambiguous.
DataOps is the process and tools that allow organisations to safely scale the use of data in production environments
But if we are going to implement DataOps, we should also change the way we work in other areas at the same time. And a great way to make that change is to adopt agile patterns and approaches from our agile cousins.
Agile DataOps
As much as I hate it when people coin a new “buzz word” I am going to do just that (forgive me) and start talking about “Agile DataOps”. Just like Marketing Active Customer, I think adding one more word to the name, provides a lot more clarity on how it can be described
Agile DataOps is where we combine the processes and technologies from DataOps with a new agile way of working, to reduce the time taken and increase the value of the data we provide to our customers
The team at DataKitchen have nailed what it takes behave in an Agile DataOps way with their Manifesto, if you haven’t read it I suggest you do.
I think that Data Mesh, Data Vault 2.0 and other names that seem to be appearing as a raft of new data approaches are all about combining ways of working (people processes) and xxxxOps (technology processes). Something I need to do more reading and thinking about, before I comment further.
But there is more
As an aside one of the more interesting quotes for xxxxOps I have heard is:
You build it, you release it, you maintain it. It’s yours to love and nurture not to hand off to somebody else once you have only done part of the work
Keep making data simply magical
AgileData.io provides both a Software as a Service product and a recommended AgileData Way of Working. We believe you need both to deliver data in a simply magical way.
As you would expect, we use Agile DataOps at the core of how we build the AgileData.io solution.