AgileData DataOps Archives

Upgrading Python: A Plumbing Adventure in the Google Stack

In the ever-evolving world of AgileData DataOps, it was time to upgrade the Python version that powers the AgileData Platform.

We utilise micro-services patterns throughout the AgileData Platform and a bunch of Google Cloud Services. The upgrade could have gone well, or caused no end of problems.

Read more on our exciting plumbing journey.

The challenge of parsing files from the wild

AgileData DataOps, Blog

In this instalment of the AgileData DataOps series, we’re exploring how we handle the challenges of parsing files from the wild. To ensure clean and well-structured data, each file goes through several checks and processes, similar to a water treatment plant. These steps include checking for previously seen files, looking for matching schema files, queuing the file, and parsing it. If a file fails to load, we have procedures in place to retry loading or notify errors for later resolution. This rigorous data processing ensures smooth and efficient data flow.

Magical plumbing for effective change dates

AgileData DataOps, Blog

We discuss how to handle change data in a hands-off filedrop process. We use the ingestion timestamp as a simple proxy for the effective date of each record, allowing us to version each day’s data. For files with multiple change records, we scan all columns to identify and rank potential effective date columns. We then pass this information to an automated rule, ensuring it gets applied as we load the data. This process enables us to efficiently handle change data, track data flow, and manage multiple changes in an automated way.

New Google Cloud feature to Optimise BigQuery Costs

AgileData DataOps, Blog

This blog explores AgileData’s use of Google Cloud, specifically its BigQuery service, for cost-effective data handling. As a bootstrapped startup, AgileData incorporates data storage and compute costs into its SaaS subscription, protecting customers from unexpected bills. We constantly seek ways to minimise costs, utilising new Google tools for cost-saving recommendations. We argue that the efficiency and value of Google Cloud make it a preferable choice over other cloud analytic database options.

Data Warehouse Technology Essentials: The Magical Components Every Data Magician Needs

AgileData DataOps, Blog, What is

The key components of a successful data warehouse technology capability include data sources, data integration, data storage, metadata, data marts, data query and reporting tools, data warehouse management, and data security.

Myth: using the cloud for your data warehouse is expensive

AgileData DataOps, Blog, Google Cloud

TD:LR Cloud Data Platforms promise you the magic of storing your data and unlimited elastic compute for cents. Is it too good to be true? Yes AND No. You can run a cloud platform for a low low cost, but its will take...

Observability, Tick

AgileData DataOps, AgileData Product, Blog

TD:LR Data observability is not something new, its a set of features every data platform should have to get the data jobs done. Observability is crucial as you scale Observability is very on trend right now. It feels...

App Engine and Socket.IO

AgileData DataOps, AgileData Product, Blog, Google Cloud

We wanted to be able to dynamically notify Data Magicians when a task had completed, without them having to refresh their browser screen constantly. Implementing websockets allowed us to achieve this.

ELT without persisted watermarks ? not a problem

AgileData DataOps, Blog

We no longer need to manually track the state of a table, when it was created, when it was updated, which data pipeline last touched it …. all these data points are available by doing a simple call to the logging and bigquery api. Under the covers the google cloud platform is already tracking everything we need … every insert, update, delete, create, load, drop, alter is being captured

Three Agile Testing Methods – TDD, ATDD and BDD

AgileData DataOps, Blog

In the word of agile, there are three common testing techniques that can be used to improve our testing practices and to assist with enabling automated testing.

Using a manifest concept to run data pipelines

AgileData DataOps, Blog, Google Cloud

TD:LR … you don’t always need to use DAGs to orchestrate Previously we talked about how we use an ephemeral Serverless architecture based on Google Cloud Functions and Google PubSub Messaging to run our customer data...

“Serverless” Data Processing

AgileData DataOps, Blog, Google Cloud

TD:LR When we dreamed up AgileData and started white-boarding ideas around architecture, one of the patterns we were adamant that we would leverage, would be Serverless. This posts explains why we were adamant and what...

A Data Engineer an Agile Coach and a Fish walk into a bar…

AgileData DataOps, AgileData Journey, Blog

This is the first of a series of articles detailing how we built a platform to make data fun and remove complexity for our users

AgileData >>> Modern Data Stack

AgileData DataOps, AgileData Product, Blog

TD:LR AgileData's mission is to reduce the complexity of managing data. A large part of modern data complexity is selecting, implementing and maintaining a raft of different technologies to provide your "Modern Data...

Agile DataOps

AgileData DataOps, Blog

TD:LR Agile DataOps is where we combine the processes and technologies from DataOps with a new agile way of working, to reduce the time taken and increase the value of the data we provide to our customers What's in a...

Why we chose Google Cloud as the infrastructure platform for AgileData

AgileData DataOps, AgileData Journey, Blog, Google Cloud

Pick a few things that really matter, not thousands of “requirements” We when first started developing the core of the AgileData backend for the MVP, we knew we would need a cloud database to store...