Magical plumbing for effective change dates

AgileData DataOps, Blog


In this post we discuss how to handle change data in a hands-off filedrop process.

We use the ingestion timestamp as a simple proxy for the effective date of each record, allowing us to version each day’s data.

For files with multiple change records, we scan all columns to identify and rank potential effective date columns. We then pass this information to an automated rule, ensuring it gets applied as we load the data.

This process enables us to efficiently handle change data, track data flow, and manage multiple changes in an automated way.

Nigel Vining -

Spot on, that’s exactly what we do !

Let’s talk about plumbing. Not the kind that involves pipes and wrenches, but rather the intricate network of pathways that data travels through within a data platform. Just as a plumber ensures water flows smoothly from one point to another, our task is to ensure that data flows seamlessly through various processes and transformations.

One of the key challenges in data plumbing is handling change data. Picture this: Users drop files containing multiple records for the same business concept key, such as a customer number. The conundrum lies in identifying the order of these changes. It’s like trying to track the flow of water through a complex network of pipes, based on when it entered the system.

But wait, there’s more!

The filedrop process is hands-off, meaning we don’t know the characteristics of the data in advance. We have to load and profile it before making decisions. It’s like turning on a faucet without knowing what kind of water – cold, hot, or lukewarm – will flow out.

Now, to the magic…

When dealing with change data, we pull a rabbit out of our hat and magically determine the business effective date.

We start with a simple but effective trick – the ingestion timestamp. Think of it as the moment the water hits the basin, essentially the date the record was loaded. If an updated file is dropped tomorrow, we automatically tag the records with the new ingestion timestamp. This pattern helps us version each day’s data, just like timestamping the water flow.

Let’s illustrate this with a simple table.

Consider the following data. We have multiple records for a customer number (CustNum) with different data points (DataPoint1, DataPoint2, etc.) coming in on different dates (IngestionDate). Let’s assume that these dates are when the data was dropped into the data platform.

Effective Date #1

Now, let’s add a change record (ChangeRecord) column that would represent the order of changes for each CustNum.

Effective Date #2

In this way, the IngestionDate was used to order and tag each record with its order in the series of changes for each CustNum. This ChangeRecord number represents the “business effective date” of the changes, and it’s determined automatically as files are ingested. This is the magic of handling change data in a hands-off way!

But what about a file that contains multiple change records?

It’s like having a pipe with multiple outlets. We need to use one of the attributes on the row to determine when that record was created or updated.

For this, we use a smart pattern to scan all the columns and their data types, looking for the most common and well-known business effective date columns. It’s like searching for the main water valve among a cluster of knobs and dials. If we find multiple candidates, we rank them and choose the best option.

Once we have the business effective date column, we automatically pass that information through to the AgileData AutoRules pattern. This creates the landing to history tile rule, ensuring it gets applied to the data as we load it. It’s like setting a thermostat to control the water temperature as it flows through the system.

Let’s take another scenario where we have multiple fields in the data which represents when the record was updated. We’ll also keep the IngestionDate, but in this case, all records are ingested on the same day.

Effective Date #3

Visually we can see that the up_dt is the only date that consistently changes in the same pattern as the data changes.  We can also visually see a pattern where we can infer cr is Create and up is Update and ex is Extract and dt is Date.

Now, let’s add a change record (ChangeRecord) column that would represent the order of changes for each CustNum.

Effective Date #4

In this scenario, even though all records were ingested on the same day, the up_dt field was used to establish the order of changes for each CustNum. This allows for accurate tracking of changes even when ingestion dates are the same.

Of course the real magic is how you get the machine to determine the best date field to use as the Business Effective Date to identify the change records.

Business Effective Dates are powerful plumbing magic

Plumbing, whether it’s for water or data, is all about ensuring smooth and efficient flow. By leveraging the magic of business effective dates, we can navigate the intricate pipelines of change data with ease and precision. We can track the flow, version the data, and handle multiple changes in a hands-off, automated process.

The beauty of this process lies in its simplicity and adaptability. Much like a well-designed water plumbing system, it allows for efficient data flow, easy temperature control (in our case, versioning), and effective handling of multiple outlets (change records).

Next time you’re faced with a deluge of change data, remember: There’s a magical solution waiting to be applied. Just like a master plumber knows his pipes and valves, a Data Plumber knows their timestamps and effective dates.

Stay tuned for our next post, where we’ll dive deeper into the magical world of data plumbing, exploring other common challenges and their magical solutions.

Get more magical DataOps plumbing tips

Keep making data simply magical