5 core Data Collection Patterns

19 Dec 2024 | AgileData Product, Blog

TD:LR

At AgileData, delivering our Fractional Data Service has revealed the diverse challenges of integrating data from varied organisations, industries, and systems. To scale effectively, we’ve adopted five core data collection patterns based on our “Define it Once, Reuse it Often” (DORO) principle:
1. Push
2. Pull
3. Stream
4. Share
5. File Drop
These patterns are supported by a toolkit of tested technologies like Dataddo, Meltano, and Google services, allowing us to solve new data challenges quickly. Our approach ensures flexibility and scalability, always starting with the question: Push, Pull, Stream, Share, or File Drop?
    Shane Gibson - AgileData.io

    As we started to deliver our Fractional Data Service to customers, we started to experience the variability of data integration challenges that you would expect when you deal with different organisations, in different industries, different countries all using different systems of capture and different technologies.

    But as you would expect, we have identified and adopted a core set of data collection patterns that we resuse to help us scale the machines, not the humans, following the Define it Once Reuse it Often (DORO) principle that is key to our AgileData Way of Working.

    Over the last five years we have settled on 5 core data collection patterns.

    1. Push
      data is pushed to a secure Google Cloud Storage “landing zone” from the System of Capture.
    2. Pull
      data is pulled from the System of Capture by AgileData using a Google Cloud service or a third party SaaS Data Collection service.
    3. Stream
      data is streamed to an Google Cloud Pub/Sub or directly into the underlying Google Cloud BigQuery instance.
    4. Share
      data is shared between partner organisations, ensuring controlled access and collaboration across parties.
    5. File Drop
      data is manually uploaded via the AgileData App, or manual dropped into a secure Google Cloud Storage bucket.

    Of course there are always patterns within patterns, for example we may use a delta detection pattern on the AgileData side to detect changes, or we may rely on the system of capture to push change data records to us, or we maybe relying on only new events to be streamed to us.

    But when we first look at a new Data Collection problem, we always start of with which of these 5 core Data Collection patterns we are going to use, before we delve into the finer implementation details.

    We also have a toolkit of technology options that we have defined and tested that help us quickly leverage one or many of these patterns.

    For example:

    Customer can manually upload CSV or JSON files to a secure Google Cloud Storage bucket, or use the file upload screen in our AgileData App to do the same data task.

    We can automagically collect data from Amazon S3, Azure Blog Storage or any other form of file based “data lake” using Google Cloud Storage Transfer Service.

    We can stream Google Analytics or Google Ads data directly to Google BigQuery using the native Google data collectors.

    We can stream data to Google Cloud Pub/Sub.

    We can Pull data from hundreds of SaaS data sources using Dataddo, or we can create a custom data collector using Meltano if Dataddo doesn’t already have one.

    This is just a subset of the technology patterns we now use, and no doubt we will add to them the next time we onboard a new Customer with a new technology problem we need to solve.

    The key IMHO is to abstract the Data Collection patterns from the technical implementation patterns, to create a shared language.

    Our shared language always starts with:

    Do we need to Push, Pull, Stream, Share or File Drop the data to get it into AgileData?

    Keep making data simply magical

    AgileData is focussed on removing the complexity of managing data in a simply magical way.

    We have multiple ways to collect your data from your systems of capture, because every organisations is slightly different.

    AgileData

    Do more with less

    We remove the need to build a large dedicated team of expensive data experts, by reducing the effort to do the data work and by doing the data work for you

    Without AgileData

    No AgileData Team - Data Engineers

    With AgileData

    Google Cloud Ready BigQuery