Why we chose Google Cloud as the infrastructure platform for AgileData

15 Apr 2020 | AgileData DataOps, AgileData Journey, Blog, Google Cloud

 

Pick a few things that really matter, not thousands of “requirements”

 

We when first started developing the core of the AgileData backend for the MVP, we knew we would need a cloud database to store customers data.

We had a lot of experience writing in SQL and also wanted the plethora of visualisation and tools out there to be able to easily and seamlessly integrate with the data in AgileData.

We wanted something that required minimal administration on our side, and the ability to scale it on demand as we added more customers and more data.

We knew we would be leveraging our config/rules based patterns and were keen for the code to execute within the database, rather than on “application” tier /containers.

We wanted to be able to manage costs. We wanted minimal costs up front as we developed the MVP and then we wanted to be able to scale the costs as we scaled our customers and revenue.

Beware the implications of “new” or “cloud washed” technologies

Nigel and I had been involved in a couple of data warehouse projects that used non database technologies, such as Hadoop and NoSql databases, and had experienced the pain of using these technologies for managing data.

We had also worked on projects where we used the cloud database technologies from AWS and Microsoft Azure, as the repository for data warehouse style data. I found them underwhelming, probably due to the fact that they were based on technologies that had been “cloud washed” and as such were not cloud native. This resulted in a whole raft of workarounds to our standard patterns and more ongoing maintenance than we were aiming for.

Test your assumptions are valid early, leverage the wisdom of the crowd

I originally had planned to use Snowflake as the underlying database for AgileData, as it seemed to meet all our initial criteria, and solved some of the problems we knew would arise with the other options.

As we were starting out research spike for using Snowflake (we refer to them as McSpikey’s) I was also starting to coach a new data and analytics team on adopting an agile way of working. The team were also doing research on platform options, and had shortlisted to AWS and Google Cloud Platform. I would have put money on them picking AWS and Redshift, but as their evaluation continued they discovered the power of BigQuery and ended up going down the GCP/BigQuery path.

I see great value in leveraging proven patterns from others as a way of accelerating delivery, so we decided to have a look at BigQuery as a possible option as our primary cloud database, rather than snowflake.

Cloud platforms are adding new capabilities and features faster than you can adopt them

I would like to say we did an in-depth, feature by feature comparison, but that wouldn’t be true. What we did do was look at the core differences in the way BigQuery and Snowflake would influence how we built AgileData, and how we would mitigate any big concerns.

We ended up picking BigQuery and the entire Google Cloud Platform as our core infrastructure platform.

A few things stand out as a memorable in our journey using GCP to date:

  • We have access to a myriad of technologies components within GCP, rather than just a cloud database (and we are using quite a few of them);
  • GCP components may not do everything we want, but they just work, and they integrate together in ways that surprised us, and saved us a massive amount of development hours;
  • We were concerned about the cost of BigQuery and the chances for massive bill shock. We have used GCP and BigQuery in a way that has ensured it is safe and this won’t happen (and we believe it will cost us a lot less than Snowflake as we scale);
  • While BigQuery behaves in a similar way to a traditional SQL database, you need to adapt your patterns slightly to get the full value out of it;
  • Google Data Studio is a surprisingly good product for creating dashboards, especially given it has no license costs and its pre-integrated with BigQuery;
  • Google Clouds marketing is their achilles heel, everybody has heard of Snowflake and it is the darling of the market, which its partners get to leverage as they go-to-market;
  • AWS and Microsoft have a much higher level of engagement with small startups than Google (well in New Zealand and in our experience anyway);
  • BigQuery is a beast in terms of scalability and performance, and we haven’t even started to scratch the surface of tuning how AgileData uses it to improve performance.

There are lots of other little nuggets we found so far as we build AgileData out using GCP and BigQuery. There are also a few trade-off decisions we have had to make as we do this.

But in summary:

If I had to make the decision again, based on my experience to date, I would pick the Goggle Cloud Platform / BigQuery to power AgileData .

Can’t really say better than that.