Agile Data Testing Techniques, a magical list of testing TLA’s
Join Shane and Nigel as they discuss the many notions you should understand to start your journey on the DataOps and automated data testing path.
Read along you will
PODCAST INTRO: Welcome to the “AgileData” podcast where Shane and Nigel discuss the techniques they use to bring an Agile ways of working to the data world in a simply magical way.
Shane Gibson: Welcome to the AgileData podcast. I’m Shane Gibson.
Nigel Vining: And I’m Nigel Vining.
Shane Gibson: And in today’s session, we thought we might kick off a bit of a series of podcasts around testing. So testing for us in the data domain is the bane of our existence. We used to joke in the old days that when you had a waterfall project, no matter how much time you plan for testing, it always ended up being one day. And that was the day before you went live, and then our users would go and test in production for us. So there’s been a lot of work done and testing in the data space, but in my view, it is still a very immature part of our domain, both in terms of tools that we can use, and repeatable techniques and patterns that we should adopt. So Nigel, I thought, what we’ll do today is we will do a whirlwind tour of all the testing terminology and techniques we’ve heard of, we’ve a little bit of a micro description on each one. And then in future podcasts, we’ll take each one of those and break them out into their own dedicated podcast, where you talk them about them in more detail. So what are your views, Nigel, what we should do?
Nigel Vining: Sounds pretty good to me. And it’s actually topical, because this week, I have been working on daily data validation checks for a customer sort of top of mind for me at the moment.
Shane Gibson: Excellent. So validation, not reconciliation, but we’ll get on to that. So let’s kick off with some of the Agile words that we use. So one of the key things that people were doing when they do Agile delivery now is TDD (Test Driven Development). So for me, that is where we write the test first, then we write some code, and we iterate the code until the code passes the tests.
Nigel Vining: I agree, that’s an approach. That may be actually how I do it by default. I know the outcome I want to achieve. I iterate until I get that outcome. So that’s probably the one I naturally follow.
Shane Gibson: So the second version is what we call acceptance test driven development. And for me, that’s where we focus less on the data are more than the features. And the reason for that is, it is unusual for business users to be able to tell us what the acceptance tests repeated that piece of data is for them to say, on the 23rd of March at 11am, there were 1,329,251 customers. So what they tend to give us as acceptance criteria, which we tend to want to get as part of our user stories and so those are things as a user, I want to be able to solve that. And so for acceptance driven development for me, that’s where we are comparing that we’ve delivered what we need to deliver to pass those acceptance tests. The last one is behavioral driven development. And for me, that is understanding the way the database, so we tend to use our role based paradigm, a technique based out of Gherkin and Cucumber, where we would write a test that somebody’s given, blah, blah, blah, and blah, blah, blah, and blah, blah, blah, them. So a load of customer data comes in, and customer data is from the Salesforce system, then we would expect the customer ID to be in a format of blah, blah, blah. So we’re writing natural rules language type tests to make sure the data behaves that way or given. We’re loading some data and the data has regions, we expect to see five regions. And therefore when that changes when the behavior of that data changes. We get it, we do it. So for me, that’s how I describe applying TDD, ATDD and BDD and data world. The other thing that we use in terms of terminology a lot is the difference between reconciliation and validation of data. So people often talk about reconciliation. But for us, reconciliation is where we can compare two things, and they should equal each other. What’s your view on that one, Nigel? Is that your definition of reconciliation?
Nigel Vining: Yeah, reconciliation automatically left the right pattern. That’s where we have something on the left, which is generally a source. And we have something on the right, which is generally a target or a report or an output. And if the numbers on the left match the numbers on the right, we say that’s reconciled.
Shane Gibson: So that gives us challenges, because does that mean we have to go and hit the Data Factory, that sort of system that’s generating those numbers or that data for us to reconcile it?
Nigel Vining: That’s a good question. Sometimes the alternative is we build in some form of extract into our process that at a point in time, we have a record from that source system. And it says at this point in time, which might be midnight, there were ‘X’ number of customers or ‘Y’ number of dollars, or ‘Z’ number of something. So we effectively pick a point in time, and we draw a line in the sand. And we say the source system said, this was the state of play at that time. And then we can use that record to reconcile our right hand side to make sure everything’s flowed through correctly.
Shane Gibson: And we’ll talk about it more in the podcast dedicated to reconciliation. But we have other lighter techniques where we can say count rows on both sides of the left, right and see if they equal. So we have some heavy reconciliation processes and some lighter reconciliation processes. And we may want to mix the use of those reconciliation techniques, depending on the risk of mutation of data between the left and the right.
Nigel Vining: Generally, we start at a very high level, we just look at a summary of what’s on the left compared to a summary of what’s on the right. And if that’s generally green, we don’t always go to a more granular level, we say the top most numbers match, we’re pretty happy with that. We don’t go down to the next level.
Shane Gibson: And when we compare that to validation, so within a data platform where we are constantly changing data, we are changing what it means, we’re changing how it’s stored. We are filtering and subsetting. So we might start off with a million rows. And for a particular reason, we end up with 100,000. And so that means we can’t reconcile because the left never matches the right, but we still need to validate it. So we talk about the techniques we can use to validate that when we’re doing something bad to the data, we can validate that expected result is consistent and doesn’t mutate over time.
Nigel Vining: Yeah, and that’s the best one I’ve been constrained on this week actually. You’re right, we can’t directly compare left to right. But what we can do this case in point, it was basically tracking sales per day for an organization. Now the thing of sales per day as they change constantly. So actually, the validation checks became to check that the averages weren’t moving too fast. So today’s sales were in line with what yesterday’s were what they were last week, and last month. So we’re basically over time, we’re checking a trend that dollars are moving where we expect that they weren’t suddenly having between one day to the next or doubling between one day to the next, because we know that it should progress in a reasonably linear fashion. So we’re using actually a moving average by day over a trailing period to validate data. So that’s slightly different, but reasonable approximation for that.
Shane Gibson: So you’re effectively applying some anomaly detection to validate the data’s fit for purpose. Another example is when we talk about data quality metrics, or data profiling of the data, we often want to validate that data has no nulls in a table that we’re receiving, or a table that we’re creating. So we write tests to validate it. We use data quality profiling to tell us we’ve got a problem that we should really be writing tests to tell us when that behavior happens. So again, data quality and validation testing for me, kind of goes hand in hand because it’s identifying the data is not valid. It’s not fit for purpose for what we expected to look like. I think one of the other things that we need to talk about, in one of the podcasts this idea of testing techniques we can use when we likely couple our data platform, or lightly decouple it So we talked a while ago, in one of the podcasts about this idea of Pac Man. Pac Man is achieved a task, so data comes into Pac Man, Pac Man does some stuff to the data exits Pac Man. And what that means is we can use some techniques that are likened to sensors. So between each of the Pac Man, we can write sensors that sniff the data as it goes past. And those sensors can have a set of rules or tests that are related to them. So imagine if we had a sensor that sniffed data and identified where that data had nulls and a record. So rather than writing that test every time, we have a table to say, is there a null or writing a test and applying it, we can have a sensor that says whenever data moves between moving parts, and they’re likely coupled, or decoupled sniffed, and where you see nulls raise an alert, so it’s a technique we can use to write the test once and apply many as part of our data pipeline or our data supply chain, rather than having to remember to apply that test. So I think it’s talking about that, and one of the podcasts about that emerging practice and how hard it is and whether it’s a good idea or not.
Nigel Vining: Yeah, I agree. Actually it resonates with me, that’s quite nice one, because you’re right, we write at once. We effectively just mark it as something we want to do every time we run a process and it just happens magically. It’s a nice pattern, and it’s a common sense one.
Shane Gibson: So one of the other challenges we have is, is once that data has been made fit for purpose within a data platform, we then have a bunch of reporting or visualization or analytical tools that consume that data and typically do something with it. And those tools are typically decoupled. They’re either doing some form of direct query where they’re grabbing the data out of the data platform on demand, or they’re grabbing the data on a repeatable basis and storing it within the tool itself. But regardless of those techniques, that data is persisted either in memory on disk with a Net BI tool, or a visualization tool. And those tools have a lot of capability to change that data after the fact as well. But if we just look at that first thing around consumption, if I have a report that’s pulling data out of the table on the data platform, how do I know that tool is actually bringing the data back properly? How do I know it’s not got something in that query to bring it back, that’s restricting the number of rows? How do I know that I have 1,000,001 customers and my final consumable table, and that we’re not going back into my Tableau dashboard, Tableau is actually seeing 1,000,001 customers. And that’s a bit of a problem for testing, isn’t it?
Nigel Vining: What actually also came to mind, something I’ve seen recently is if the underlying data layers or the consume layer that the tools are plugging into haven’t been thought out for safety of use. And what’s immediately springs to mind is non additive measures to meet up in your consumer, you end up with a situation where the toll applies an average on an average or an average of a field that can’t be averaged in it. So it’s very easy. But straight away, you’ve let the user do something wrong, because you haven’t predicted them in some way, that’s what always worries me. Because something in the consume layer, that the tool can get wrong without any real warning or consequence.
Shane Gibson: And with a lot of the tools that we use in the data space, they’re starting to adopt this idea of being able to embed or call tests within those tools. But the BI tools not so much store, I don’t know of many tools of any really where I can write a test that when a dashboard has been rendered, I can validate the numbers on the dashboard are correct, that they meet my valid expectation. So in tooling wise, that’s a pretty light area. And maybe before we do a deep dive on that podcast, we might need to go and do some research on the market to see whether actually you need the new breed of this tools dealing with that for us. Another one that I hear a lot and I don’t write code. So I don’t really know a lot about this concept of Linting, what do you understand Linting is?
Nigel Vining: Sure. So we under the covers of agiledata.io, we use Linters in format as effectively to keep our code clean, readable and syntactically correct. So we use a format here. In this case, we use black for Python, which whenever we commit code, we run the black library over and all it does is it checks that all the indentation is correct, spacing’s create the codes laid out nicely to a standard template. This has benefits for people coming along downstream reading the code, the code is presented in a pretty format. It also identifies any formatting issues. The other one is we make use of flake, which is a Linter. And Linter is effectively a library that we run over our code. And it looks for syntactical or construct errors. So the code may be formatted correctly, but it may have errors that are unforeseen you might be adding to things that on the surface, it looks you’re adding them but you’re actually concatenate them together. So it can help find syntax and construct areas that maybe aren’t obvious and would pass tests, but possibly shouldn’t. So they help keep us safe, and keep our code readable.
Shane Gibson: So it’s not testing that the data is valid or reconcile, but testing that our code meets our standards, and it has a chance of running and executing successfully, clean anything obvious.
Nigel Vining: Exactly, that is correct.
Shane Gibson: I wish I had that in words, that’d be great. So let’s talk before we finish off about some anti-patterns, so where we see testing behavior being less than desirable. So I remember a project that we both worked on before we started agiledata.io, there was a testing team of eight people that weren’t part of the delivery team, they sat in their own part of the floor. And they read the requirements that were written which was a bit of an issue, because we were following an agile way of working. So our requirements were specific for the tasks that we’re going to achieve in that iteration, and constantly evolving and changing as we learn new things. And then they would create these massive Excel spreadsheets of the list of tests they were creating, each of those tests were written in code. So that was impressive. So they were code base test, rather than clicky, clicky GUI human testing, but that code base had to be run manually. Unfortunately, for the testers that tend to hit be run outside of standard working hours, when the delivery team were doing all the hyper work. And then every time something major changed, all those tests had to be completely rewritten and completely re-executed. So those eight testers were very, very busy. I’m not sure that they were really that efficient or effective. So for me, that was an anti-pattern on how we do testing where we have a separate team. They’re working off the documents rather than conversations. They are out of sync. They’re always late to the party. And every change is massive and kills them. So for me, that’s probably an empty pattern that I see time and time again.
Nigel Vining: Yeah, the Excel ones. Funnily enough, it’s something else I’ve had experience with every weak. I find a lot of testers naturally gravitate to Excel because it’s a known quantity. And it’s the first place they cut and paste data. And then they put if ‘Column A’ equals ‘Column B’ put a tick, otherwise put across and format it read. And they tend to automatically adopt that because it’s a nice easy way that they paste data, and they get all the red rows, and then they pass them back. But for all the reasons you expressed, it’s a very manual non-repeatable process, and you have to keep doing it over and over. So I tend to take the approach that if they come up with all the logic to get the test data, I would then generally drop that into a repeatable scheduled tool or process as you run it for them automatically every time maybe codes released or every day to check the overnight numbers to try and take it out of the spreadsheet round and put into something a little bit more enterprise level.
Shane Gibson: Yeah, and automation helps. It helps reduce the cost of rerunning tests, which means we feel more comfortable running them earlier, running them more regular.
Nigel Vining: More often. Yep,
Shane Gibson: Again, there was a pattern I saw one project and it wasn’t an anti-pattern because it had some of the tenants of automated testing. Some of the tenants have been able to adapt change. One of the team members was the tester. He was a very experienced tester. And what he’d done is, he’d built testing toolkit effectively. We talked a while ago about the old days where you’re just getting your top pocket with your sample code. So he had one of those. And so what he would do is he would take the design documents, and he would write all the ETL code. Again, so the delivery team, the data engineers would write the code. And he would write the same code, but write it differently on his toll, and then he would schedule their code to run through the scheduler overnight and his code to run through his scheduler overnight. And then he would automate the comparison, the reconciliation of the left, right of both of those runs. So you could quickly when you walked in the morning get a view of where it matched, where it reconciled and where it didn’t. Challenge them was who is right? Was it purely a coding error on his part, or the data engineer? Or again, was it because we were running an Agile way of working was it the fact that the engineer had discovered some issues with the data as they applied the rules, they had gone and had a conversation with the team and the stakeholder and the product owner to make a call. And therefore they changed the code and the documentation was out of sync. So again, you spent a lot of time going down the rabbit hole for the ones that flag red. Why were they red? One of the patterns techniques out of that was it was automated, which was great. It did save time from that point of view.
Nigel Vining: I do remember that pattern they applied. And it was a work of beauty. It was slightly misplaced but it was a pattern. But the fact that was based on documentation that quick was always becoming stale because of evolving things. It was insanely downfall. Now credit to that person, it was a pattern.
Shane Gibson: But also there was twice as much effort because we’re writing tests, we were writing code and the tests of automated.
Nigel Vining: So it was a very elaborate test approach. And you’re right, most of the rabbit holes would do to requirements that have moved on between the time that the test was created and run.
Shane Gibson: I think the last one is an interesting one. I thought it was a good pattern actually. So what happened was, as the team evolve their Agile way of working, the BI type skills within the team, the data analyst type skills, started doing more and more of the data modeling. So as part of the early exploration of data and understanding the information product that needed to be delivered, they started doing some exploratory work on the data and some exploratory work on what a data model might look like to support the outcome that was required. And as I got those skills, they started writing tests in Jupyter Notebooks. So they would take the data and quickly model in Jupyter Notebooks to figure out where it fitted. And then as part of that, they started to see where the data was mutating against, or wasn’t valid for what they were doing. So they then started writing tests for that. And when that light version got handed across to the engineer, because there was still a semi pipeline process, the engineers would go through and write the code and harden it and all the good stuff. And then the endless type people will then come back in and effectively repoint the Jupyter Notebooks at the new data source. There wasn’t a one for one point, because the model had morphed slightly as they went through. As always, as the expectation of the model, that’s the real data. But that mutation of requirements didn’t change the requirements. It didn’t happen as often as it used to, because the requirements have been developed in line with the data. So they would repoint the Jupyter Notebook, set them with a little bit of rework, and then we run them. So again, that was a form of testing early, and then reapplying those tests. So I thought that was a good way of working for the majority of the team. I thought it was creative and added value, reduce their time, helped them adapt to change, and gave them a better quality data outcome.
Nigel Vining: Yeah, I think that’s probably a nice example of a mixture of test driven development team, and requirements test driven development are sort of iteratively smashed together?
Shane Gibson: Yeah, so we will call that combination of TDD and BDD. So “Test Driven Development” and “Behavioral Driven Development”, because it’s combined, applying a couple of the concepts from both of those wasn’t really acceptance driven, because it wasn’t based on the user stories or the acceptance from the product owner of what good look like, but that’s okay. So I think that was a whirlwind tour. So we talked about TDD versus AD, TDD versus BDD. So we might deep dive on that in another podcast. We talked on the difference between reconciliation and validation, and how we use a technique of left right for reconciliation and how we use Gherkin and Cucumber rules type frameworks for validation. We talked about some of the ways that could be really innovative and decouple through the concept of sensors to write ability to test for certain or validate certain things repeatedly. We talked about, some of the things like Linting that help us have better quality code. And we talked about the challenge of BI visualization and analytical tools being decoupled or lightly coupled to the data and how hard that makes it for us with those types of tools to reconcile and validate into in from the Data Factory out to the consumer. So I think we’ll pick one of those next, and it’s been another half hour deep diving into one of those on the next podcast.
Nigel Vining: Sounds fantastic. Thank you.
Shane Gibson: Excellent. All right, well, catch everybody later.
PODCAST OUTRO: And that data magicians was another AgileData podcast from Nigel and Shane. If you’d like to learn more about how you can apply Agile way of working to your data, head over to agiledata.io.