The Magic of Serverless
Join Shane and Nigel as they discuss what serverless actually means, and the value of leveraging serverless components within your data platform.
Read along you will
PODCAST INTRO: Welcome to the AgileData podcast, where Shane and Nigel discuss the techniques they use to bring an agile way of working to the data world in a simply magical way.
Shane Gibson: Welcome to the AgileData Podcast. I’m Shane Gibson.
Nigel: And I’m Nigel.
Shane Gibson: And today Nigel and I thought we’d have a bit of a chat, again on another engineering focus chat. But this time, we want to talk about this thing called serverless. Which is something that we use a lot on cloud platform that we leverage. For me, it’s a really, funny word. In fact, a number of years ago, I was very disparaging of the term serverless. Because there actually is servers involved, right Nigel?
Nigel: Yes, it is a misnomer. There’s a server under the covers. It’s just what the definition of serverless actually means to you.
Shane Gibson: Yes. So why don’t we start off with trying to define how we would determine if something was serverless or not?
Nigel: Sure. So my definition of how we use it is, you don’t own the server, you don’t have to manage the server, you basically use that server on a second by second or minute by minute basis, and you pay for that fraction of the time. Something’s actually excuted up, 99% of the time, you’re not doing any work on the server. So we call it serverless.
Shane Gibson: Yes, so for me, I like that definition, the definition of when I ask it to do something, I get charged. And when I don’t ask it to do something, I don’t get charged. So for me, that is one of the tenants of serverless piece of technology. And one of the benefits it gives me.
Nigel: I agree. I like it for the simplicity that you don’t have to worry about all the overhead that goes with a server in the traditional sense of the word, which is data centers, networking, patching, upgrading, basically, if all the downsides of running your own server, you don’t have any of that was serverless. You just ask it to do something? It does it, it always works. And then it stops.
Shane Gibson: So if we compare, something that’s serverless versus something that’s based on a container, for a container, I don’t own the server, I don’t own the network. I don’t own the disk effectively, I don’t own the infrastructure. But in my definition, it’s not serving us because I have to turn it on and I have to pay while it’s running. Wherever I’m actually using it or not. It’s sitting there charging me.
Nigel: Yes, some of the products on most cloud platforms, batch to serverless, but you are paying a hourly rate effectively, because that server is always available to run your container. It never goes to zero as well. It’s under the covers, is always a one of it running.
Shane Gibson: So it’s a virtual server then, right?
Nigel: Yes, that’s a good definition.
Shane Gibson: So why don’t we run through all the moving parts we use, and then try and have an idea of whether we think they’re serverless, and what the value of them is. So let’s start with the big one, BigQuery, where we store a lot of really cool data. So what do you reckon serverless, virtualized server, physical zero.
Nigel: Big Query is an interesting one. Because there’s two things going on there. There’s the compute resource, which I’m going to say is effectively serverless. Because if it’s not doing any work, you’re not being charged. But the behind that you’ve got your stored data, which is being stored and you are being charged, while it’s been stored, but not being used. So it’s BigQuery is a bit of a hybrid.
Shane Gibson: And, that’s when we use a charging mechanism against the build if I use a metric of how am I billed to the build whether it’s serverless, or a virtual server, in BigQuery really sits in the middle.
Nigel: Yes, it’s got a split cost. You’re paying to store over the long term. Obviously, as we talked about a couple of weeks ago, if you’re not accessing your data all the time, it goes off to lower tiers of storage and it’s cheaper, but if you’re not running queries, BigQuery is not charging you anything.
Shane Gibson: So, one of the other ways to look at it is with a virtual server, I’ve got to install and maintain the software, right? I may use some form of template or image or some magical provisioning thing that grabs it off the marketplace and that stuff, but really what’s held within that virtual containers mine, I control when it’s upgraded. And when it’s not, I can control how its configured at any level and how it’s not, I have access to the operating system nine times out of 10. Whereas if we look at BigQuery, we have none of that, we don’t control when new features are added. Or when it’s patched, we don’t have any access to the underlying operating system or the infrastructure that runs on. We don’t go to a marketplace, we just tick a box or use code in our case to say we want to access one. So is that the second way we can look at it, as one how we charged and be what we control?
Nigel: Yes, that’s a good one that naturally flows on to, it’s got me thinking about Cloud Functions, which we make use of and that’s exactly the case, we pay per second of compute time. But we don’t control the architecture, is we choose what type of architecture we want to run a code on up front, for example, say, I want to run a Python cloud function. And all I provide is literally the code I need to run. Python is pre installed, it’s patched, it’s good to go. We don’t control that, we’ve just asked for a Python environment. And we throw stuff at it, and we get charged per second to run it.
Shane Gibson: And so if we look under the covers of Cloud Functions, because we’ve spent quite a bit of time looking under the covers as much as we can, with Cloud Functions, really operates differently to BigQuery. So the perception I get is, BigQuery has a whole lot of infrastructure just sitting there that everybody kind of hits. And when we want to run something that runs it for us, allocates us some resources off that thing, clustery thing. But when we look at cloud functions, we can see that they’re stopping and starting virtual servers on a regular basis, every time we want to run something. So architecturally, they’re running, one’s more of a clustered shared environment. And the other one’s more of an on demand virtual server environment. But from our point of view, we pay for what we use. And we don’t control any of the underlying infrastructure or operating system for each of those services.
Nigel: Yes. Which is actually nice, because effectively, there’s nothing to worry about, we throw code at it, it runs. If we throw two bits of code at it, it starts two instances and runs them, if we throw 100, it still does the same behavior. It is a consistent behavior, you can run one or 100 Cloud Functions simultaneously. You do not have to worry about it, it just works.
Shane Gibson: So let’s look at the other end of the extreme for us. So, we do documentation as code. So we, write small bits of code that is effectively our documentation. Whenever I check it into our repo. Some magic happens. And our documentation site gets rebuilt with that latest version of all that documentation and code. But there’s not a serverless architecture that we’re leveraging for that appears it.
Nigel: Its sort of midway. I guess the serverless architecture as a continuum from the small ones that we pay per second, we have no control over to the ones where you talked about containers and VMs before. So for our documentation, we have an image, which contains all and call it our documentation engine, which is the Sphinx, open source library. We have an image of sphinx that’s customized to how we want to build our documentation. And effectively when we do a build it mounts that container, runs it for a period of time while it builds our documentation. Then it shuts it down. So it is serverless in some sense the word but it’s different in that we’re effectively mounting our own custom image. So we get to choose what’s installed on that image and what’s running, but we’re still just paying for the amount of time that the image is mounted and running. And then it basically shut down again. It’s, nothing, but it’s up or tear from a Cloud Function. But it’s not a full, constantly running server. We mount it, we use it, we drop the image, and it’s done.
Shane Gibson: But my understanding is that Serverless is behavior, is something you’ve engineered.
Nigel: We needed a level of customization, that didn’t come with a Cloud Function, because we needed to customize the OS effectively, and install some additional products that were de vailable in a Cloud Function. So we built our own virtual image that we ran for a period of time, and then shut down and it happens automatically for us. It’s just another level of civil ish. Severless type behavior.
Shane Gibson: So it’s not a service or a serverless feature, we get from Google, what you’re doing is leveraging some of the Google services to provide a capability that behaves in a serverless pattern. Copied the patent, but you had to build it. And so if I go back to my two cheeks, do we pay for when it’s not running? No, because you’ve defined us the patent that says, when it’s not running, turn it off. Do we control the operating system and that container itself? And are we responsible for patching and maintaining their container? The answer is yes. So we get one of the benefits of the serverless pattern. And the other one, we were the cost and effort of maintaining that.
Nigel: It’s exactly right. If we want to update the documentation builder, store more features or patches to a new version, we basically have to start up that image, update it, and then re persist it. So we can use it next time. It’s code upon.
Shane Gibson: Cool. So what else we got in the technology stack that we can have a look at? What about the cloud repo, where we store our major code?
Nigel: I was going to say the next, on top of the cloud repo, we make use of the cloud build functionality, which I guess is a form of serverless product. So what we do then is by using commit triggers on our repos, when new code is committed into specific repositories, what we basically says, if the master branch of say, documentation is updated, then I want you to use Cloud build, which is a serverless feature, to run a whole lot of steps for us, deploy some code, check it and then shut down again. So Cloud build a little bit special, because we effectively can perform a whole set of instructions. So it might say, get the latest code that’s been committed. Do this to it, do that to it, copy it here, copy it there, and then send me a message to saying you’ve done that. So again, cloud build is charging us per second that it’s running to perform our instructions. So we typically use it to pull the code out of the repo, do some functions on it, and then possibly copy the output into a Cloud Storage bucket, and then shut itself down again.
Shane Gibson: So do you tell Cloud build to start and stop? Or does it just wait for you to call it and then when it’s going to go, that shuts itself down?
Nigel: So it’s triggered by something happening, in this case, a commit to a repository, a user can invoke it directly off the command line, I can say, run this cloud build script, while we’re developing, it works in exactly the same way. It’s think of it as a recipe. It’s got a list of instructions, doing it when it started, cloud build, reads through the list, does all the instructions and then shuts down.
Shane Gibson: So from a serverless costing point of view, Google Cloud takes care of their serverless behavior for us.
Nigel: That is exactly right. It’s running all our server in the background. It’s not ours. We don’t maintain it. It just runs a set of instructions.
Shane Gibson: So that’s the second question is we don’t maintain that operating system. We don’t patch it. So it’s a true serverless offering from Google rather than some inquiry, you’ve applied the service pattern.
Shane Gibson: Cool. And Cloud repo, same thing. We just check code and when it needs to receive that request that does something and receives it which means we’re not working, or we’re doing bad things from a director’s point of view. But as a serverless component, it doesn’t care.
Shane Gibson: So what about the code that we use to go and hit some click data from some of the data factory? So let’s say, when we go and hit Shopify, and grab the data out of that, what are the patterns that we’re using for that one?
Nigel: We actually use Cloud build and a container for that one, we use a container, because the container has the particular software installed on it, that we use to talk to those API’s. So we’ve installed that library on the container. And when required, we effectively run that container, and then we shut it down again.
Shane Gibson: So we’ve applied our serverless pattern, to something that’s more of a container model. Because we still control the operating system, we control that container image, we control those things?
Nigel: That’s correct, because we’ve installed a number of pieces of open source software in that container, that have the libraries for talking to the API’s and question. So we maintain that, we just use the serverless pattern to run that container for us and get charged per second while it’s running, and then shut it down again.
Shane Gibson: So why wouldn’t we just use the serverless out of Google Cloud, instead of having to apply our own patent for this one.
Nigel: We needed to install the software, we wanted to run somewhere. So our options are we can have something that’s literally real server, that’s running 24/7. And we’re paying for it the whole time. But we only want to execute those libraries once a day maybe.
Shane Gibson: I think the other thing was when we did it from memory, or when you did it from memory. The open source software we’re using required certain permissions or behavior from the server or container was running. So when we tried to deploy it into the serverless services, they effectively didn’t run properly. There were things that relied on that, that serverless capability didn’t allow us to do. Because we don’t own the operating system. We don’t own the underlying disk, we don’t own a whole lot of moving parts that actually that particular piece of software, assumed you had access to and was designed to actually leverage and therefore, computers had no right?
Nigel: Actually, at the time, when we started this journey, the cloud functions that we were used for everything had a timeout of three minutes on, which is why we got bumped up the additional serverless ones, because at the time, the smallest tier of service was three minutes, the next tier was nine minutes. And then the next step was as long as you like, three minutes wasn’t long enough to basically set up the environment, run the API’s to pull data and bring it all down within that three minute period, which is why we actually had to deploy a container, which didn’t have a time restraint. But since Google was very helpfully three times in the last 18 months bumped up those time limits. Cloud Functions can actually run for nine minutes now from memory, which is particularly long enough to do a data extract using the API.
Shane Gibson: So there is still a risk though, when you use a serverless feature, that there will be some type of access or capability that you would normally have be able to leverage on a container on a physical machine or a virtual server, that serverless thing doesn’t allow you to do right and so therefore, you have to regress back to applying some form of seamless pattern to that container based.
Nigel: And those restrictions are probably likely around CPU, memory and disk, because using serverless, the true serverless you’re effectively using a preset environment. It’s got two CPUs, it’s got X gig of memory. and you’ve got access to x gig of temp storage. So if your requirements for something that’s very memory intensive, and for example, Cloud Function, it gives you a four gig of memory, but you’re really going to need twice that much, because you’re going to do something chunky. That’s not going to work for you. Or if you need to physically store a whole lot of data on a local disk. Again, that’s potentially not going to work for you, which is where you go to your own prebuilt image, because you can say, I need a container that’s going to have access to plenty of disk or plenty of memory. And that might be a guiding choice.
Shane Gibson: So, really, it’s quite confusing. Because you can get something out of the box and serverless, you can get something out of the box that behave serverless, because you only pay for what you use. And you don’t manage the operating system, but that you’re talking about number of CPUs and amount of memory, which to me, makes me think of a server. And then you can take virtual containers and make them behave using a serverless pattern, where they only run when you need them to run. So you only pay it that way. And you’re effectively using some form of template to manage the operating system, so you’re still having the pain of managing it. It’s magical when you use it. So really, there’s a lot of things to think about when you talk about, is it serverless? Or is it not?
Nigel: Yes, becomes technical after a while. And I think what makes it more challenging is there’s multiple products on a spectrum, and they all slightly overlaps. So the smallest, offering may actually cover all small use cases, well, up until the middle offering may do all your small use cases and then large use cases, there’s no hard and fast rule, that says this is my use case. So I automatically use the serverless function, because all of them might satisfy that, then you’ve got to make a second choice around how fast will it be when it runs, might choose an architecture that’s going to be quicker or more performant with a less latency, when it starts up and shuts down. Because some of the serverless comes with a small overhead because you’ve effectively made a request, it’s now got to effectively start itself or to deliver your service and then shut itself down. It may only be a second or a fraction a second or two seconds, that may be an important consideration for you.
Shane Gibson: So for us, its shows the value of our guiding principles from an architecture point of view. And by that I mean, at the beginning, when we need to do something new, we will always look for a serverless component that enables us to only pay when it runs and not have to wear the cost and effort of maintaining an operating system or patching or managing that container. And if we find one of those, and it meets our needs for now, that’s what we use. And we know that we can always refactor, that service PacMan later on to something that we have more control over, that takes more effort, and therefore more costs on our side to maintain. So the guiding principle of serveless first, but not serverless over everything is really important. Right?
Nigel: Agreed, we always look for the serverless, the smallest serverless option. And then we just go up a level if it doesn’t quite do it and up another level, if we have to get serverless first principle.
Shane Gibson: And then always ready to refactor because we know that over time, Google and our cases, Google Cloud, adding more and more serverless features, or removing some of the barriers of the serverless stuff that we were using. To means that we can actually reflect this stuff where we’ve gone for more control for a reason, back to being able to leverage the value of that service. I struggle with the word serverless service. I think it’s an alliteration.
Nigel: So the future is serverless. I think it’s my takeaway.
Shane Gibson: Well, actually, I think the the future is biometric computing. So the real question then is when we move from silicon to organism, is that a server or is it a host? That’ll be really interesting distinction when the day comes out and whatever use it as further. For now, we’ve talked about serverless and we’ll probably put that one in the can.
Nigel: Thanks Shane.
PODCAST OUTRO: If you want to learn more about how you can apply Agile ways of working to your data, head over to agiledata.io.