Upgrading Python: A Plumbing Adventure in the Google Stack

AgileData DataOps, Blog, Google Cloud

TD:LR

In the ever-evolving world of AgileData DataOps, it was time to upgrade the Python version that powers the AgileData Platform.

We utilise micro-services patterns throughout the AgileData Platform and a bunch of Google Cloud Services. The upgrade could have gone well, or caused no end of problems.

Nigel Vining - AgileData.io

Time for an upgrade

As the AgileData journey continues, we find ourselves facing an interesting challenge – an email from Google urging us to upgrade our Python version. With the current version’s support ending next year, it’s time to embrace the latest and greatest, …. yeah, upgrade time !

If you’ve been following our previous posts, you know the AgileData Platform is built on the Google stack, relying heavily on Google Cloud Spanner as our backend config database and Google App Engine / Python for our API layer, Web-Sockets Service and AgileData App.

Cloud Functions serve as the glue holding everything together, providing a pay-as-you-go and scalable and cost-effective solution. We love this Functions as a Service (FaaS) pattern, it lets us run the AgileData App and API’s for a fraction of the cost of using dedicated servers or containers.

However, with the Python upgrade on the horizon, it’s time to ensure that our system remains robust and reliable.

Upgrading Python: The Plumbing Essentials

The AgileData Platforms backbone is built on Python 3.7, but it was time to transition to the newer Python 3.11 version.

The process is not without its challenges, as we must ensure that all our client libraries and dependencies are compatible with the latest Python version.

One critical aspect we knew we needed to address is a change in the client library’s handling of JSON data, which could potentially break one of our core functions.

So we decided we needed to bring the spanner library up to the latest version at the same time we did the python upgrade. 

# before the python upgrade
google-cloud-spanner==3.11.1

# after the after upgrade
google-cloud-spanner==3.36.0

The Query Spanner Function: Unveiling the Core

One of the most crucial functions in our backend is query_spanner(). It allows us to query data from Spanner using a simple SQL query and returns a JSON result payload.

This function serves as a pivotal connector, providing access to our data and ensuring smooth operations. 

query = “select table_id,topics,state from catalog where project_id = @project_id”
results = query_spanner(query)

[
    {
         “table_id”: “consume.cars”,
         “topics”: [
         “Cars”,
         “Vehicles”,
         ],
          “state”: “active”
    }
]

# ========================== query_spanner ========================

def query_spanner(query):

     “””
    Simple function that takes a sql query and executes it
     “””

     schema = []

     output = []

     # remove line breaks (we may have formatted for readability)

     query = query.replace(“\n”, ” “)

     try:

          with spanner_database.snapshot() as snapshot:

               results = snapshot.execute_sql(
                    query,
                   params={“project_id”: project_id},
                   param_types={“project_id”: spanner.param_types.STRING},
              )

               json_columns = []

               for row in results:

                    # first read – pickup the schema for the results

                    if not schema:

                         for field in results.fields:

                                   schema.append(field.name)

                                   # create a list of json columns we need to parse

                                   if “JSON” in str(field.type_):

                                            json_columns.append(field.name)

                    # combine the two lists (schema and result row)

                    output.append(dict(zip(schema, row)))

     # load json payloads

     if json_columns:

          for rec in output:

               for json_column in json_columns:

                    json_payload = rec.get(json_column)

                    if isinstance(json_payload, JsonObject):

                         # Updated 15.07.2023
                         # Spanner library now returns json columns as a JsonObject

                         json_string = json_payload.serialize()

                         rec[json_column] = json.loads(json_string)

                    else:

                         rec[json_column] = []

except Exception as e:

     logging.error(query, e)

return output

Adapting to the Change

Fortunately, the necessary code changes are minimal. The only modification required is to check if we are reading a JSON object from Spanner and then serialise it before passing it through json.loads() for our response payload. A smooth adjustment that keeps our plumbing intact!

Deployment and Beyond

With the code tested and validated, it’s time to deploy the upgrades.

For App Engine, a simple update to the app.yaml file, specifying the latest Python version, does the trick. 

runtime: python311

Likewise, for Cloud Functions, we update the deployment template, ensuring the right Python version is set for each service. 

gcloud functions deploy bigquery_job_completed –runtime=python311 —source=cloud_functions –trigger-topic=biquery_job_completed –memory=512MB —timeout=540s –entry-point=bigquery_job_completed

Our system is now upgraded and ready to take on the future with confidence!

Finally, the critical service we NEVER need to upgrade

Finally, there’s one service that stands tall, immune to the need for upgrades, the databases in our Customers private tenancies!

We rely on the mighty Google BigQuery, a server-less and cost-effective enterprise cloud analytics database that effortlessly scales with our customers data.

Put the data in, query the data out – it’s as simple as that.

No upgrading, no migrations, no worrying about disk space or temp space or log space or memory or release version or any of the hundred and one things my DBA friends spend all their time worrying about, BigQuery proves to be the indispensable backbone of our plumbing infrastructure.

Why wouldn’t you use a service like that ?

Flowing Ahead with Upgraded Plumbing

By adapting our code, upgrading the necessary libraries, and deploying with precision, we ensure that our customers data continues to flow seamlessly.

Leveraging the power of Google’s infrastructure, we maintain our momentum, confident that our data plumbing will stand the test of time.

And as we look to the future, we rest easy knowing that BigQuery will always have our back, ensuring a smooth and cost-effective journey through the data waters.

Onward and upward!

Get more magical DataOps plumbing tips

Keep making data simply magical