#AgileDataDiscover weekly wrap No.4

09 Jul 2024 | AgileData Discover, Blog

TL;DR

We are working on something new at AgileData, follow us as we build it in public. This week we review feedback we’ve received so far and the use cases that have started to emerge. 

Hope you’re enjoying following us as we build and document in public. 

Shane Gibson - AgileData.io

Let’s get into it!

#AgileDataDiscover weekly wrap No.4

Let’s review the feedback so far and the use cases that have started to emerge

After talking through the product idea with a few people and showing them the prototype the feedback has been positive. The uses cases that have come out of the conversations look something like:

Understanding a legacy data platform:

  • An internal data team or a data consultant(cy) wants to understand the current data platform and the human effort to do so is large and time consuming.
  • This might be supporting a green fields data platform, or making a change in the current data platform.

More detail here: https://agiledata.substack.com/p/we-are-working-on-something-new-at

Data Governance

This is effectively the same pattern of documenting the data platform, but with slightly different use cases.

  • A lot of Data Governance teams have been reduced in size and the technical BA team members have been laid off, meaning the Data Governance Managers no longer have people available to manually document the data.
  • Or the Data Governance Managers have relied on the technical BA skills in the data teams, those teams have been downsized and can no longer do this work for them.

More context here: https://agiledata.substack.com/p/we-are-working-on-something-new-at

Automated Data Migration

As with all things AI there are a myriad of startups experimenting in this space. A use case that came up a couple of times was automating Data Migrations from one data platform / technology to another data platform / technology.

Overall there seems to be a bunch of use cases that people have said will have value to them, let’s carry on and invest some more time and effort into it. We’ll also look at the new patterns we need to move from Prototype to MVP.

 

Let’s look at the new patterns we need to move from Prototype to MVP

 

We have been able to use a lot of the patterns we have built into the AgileData Platform and App as part of the Discover prototype. To move it from Prototype to MVP and add the initial list of features we need, a number of new patterns need to be implemented under the covers, including:

  1. Logging of prompt tokens so we can keep track of how many are being used each time we send a prompt off to the LLM
  2. Logging of responses from the LLM so we don’t need to make unnecessary requests, we can just show the latest response if its is still applicable
  3. Uploading multiple input files and including them in the prompt sent to the LLM, this turns out to have the most complexity as the documentation is very light, and is currently skewed towards processing multiple image and video files
  4. Lightly modelling the most extensible pattern to hold configuration data, ie bite size prompts that can be combined to produce different outcomes
  5. Parsing and storing the raw markdown responses into separate buckets so they can be surfaced separately in the app.

Most of this activity happens in our API layer as that’s where the AgileData magic happens. We add new endpoints when required or augment existing endpoints to deliver additional functionality for the web application. In this case we re-use our existing file upload pattern but send discovery files to a separate bucket where we can access them from the new interfaces. We re-use our existing LLM wrapper, but extend it slightly to provide the option to choose which model is called . We use different models depending on their strengths and weaknesses, some are better at ‘texty’ stuff, while others are better at ‘codey’ stuff.

So a few bits of behind the scenes plumbing we need to do to make the product scalable. It’s tempting to bypass some of these for the MVP, but we already know after spending some time using the prototype, that without them we would end up in a world of manual effort hurt.

Scale the machine, don’t scale the team!

Next we will talk about a little surprise we encountered with the prototype.

Oops, where did that pesky data come from?  

 

We are hyper paranoid about our customers’ data, as any data platform provider should be. We run a private data plane for our customer tenancies as part of our agile-tecture design. So we of course adopted this pattern for the inputs we used in the Discover prototype to keep them secure.

Even though the input files are secured in a private tenancy we made sure that we only used Metadata as inputs into the Prototype not actual data records. We believe we can get the outputs that are needed without having to pass the LLM actual data and we know from experience the barrier requiring data has on a product’s adoption.

One of the inputs we tested was using query logs from the BI tools querying the data.

Imagine our surprise when we started seeing actual data values in the LLM’s outputs!

Lots of thoughts surging through the brain, until we realised where the LLM had got them from.

In the Consume view the reports were using had a case statement. The case statement was being used to map some reference data. It was an old pattern we used before we built the ability to dynamically collect and manage reference data via Google Sheets. So of course this reference data mapping was in clear text in the query logs and that is where the LLM was getting it from.

For those of you following along at home – more detail on today’s experiment is here

Current AgileData Go to Market

 

I was talking to somebody the other day and they made a comment about startup founders who build the best tool in the world that nobody ever knows about. When you are building a product you have to care as much about how you will sell it as you do about building it.

I much prefer building over selling.

For AgileData we did a bunch of experiments with our Go To Market (GTM) last year to see where we would place our bet for this year.

The experiments were based on the usual GTM patterns (more info on our experiments is here):

  • Sales Led Growth (SLG)
  • Product Led Growth (PLG)
  • Partner/Channel Led Growth (CLG)
  • EcoSystem Led Growth ( ELG)

Each of these GTM requires some sort of investment. And all of these GTM patterns need a way to fill the top of the sales funnel, which is another set of patterns.Most companies use a combination of these GTM patterns and investments, which becomes complex. We are always focussed on simplicity over complexity.

So we decided that for 2024 we would place a bet on Partner / Channel Led Growth.

The logic and reasons behind placing this bet will require a much longer post, but one of the key reasons comes back to the vision Nigel and I have for AgileData:

“working with people they respect, who can work from anywhere in the world (we think of it as working from any beach), who get paid well for their expertise and experience, to support the lifestyle they want.”

Now we have the context for the current AgileData GTM bet, let’s think about the GTM options for this. 

We hope you’ve been enjoying following along in public as we build. 

    Keep making data simply magical

    Follow along on our discovery journey

    If you want to follow along as we build in public, check this out on a regular basis, we will add each days update to it so you can watch as we build and learn.