2. 0-1m+ scaling load balancers
Our load balancer service, a builtin feature of Google Compute Engine, Google Container Engine and Google App Engine, is itself a core component of the Google Frontend, an enormous, worldwide distributed system for delivering customers to infrastructure. This system powers the vast majority of Google applications, like Maps, Apps, Gmail, and Search. This system is designed to tolerate extreme spikes in traffic, easily scaling from no traffic to millions of requests per second, in seconds. This improves the performance of our customer’s applications; every user, no matter how many of them show up at once, will make it through to your stack.
3. 45 second instance boot times
What happens once they make it through the load balancer to your instances? Even with automation in place like our Autoscaler, as traffic mounts, you need to be able to scale up quickly. Google Compute Engine instances boot very consistently in the range of 40-50 seconds, roughly 1/5 of the time required by competing clouds. This means you can grow your application’s hosting framework very rapidly in response to incoming traffic, just like Google does for it’s own applications.
4. 680,000 IOPS sustained Local SSD read rate
Each instance within Google Compute Engine, excepting micro and small shared-core instances, can mount up to 1.5TB of Local SSD capable of 680k of sustained read performance. This is radically faster than competing systems, which max out at less than half of that, at nearly four times the cost, all while tying SSD size/performance to specific instance sizes, meaning that in many cases you pay for compute and memory you don’t need. This means caches, databases, NoSql systems, file systems and more operate at crazy fast speed to respond to user requests quickly, and to handle more users per instance.
5. 3 seconds to archive restore
Our “archival” data storage service, Google Cloud Storage Nearline, delivers data availability within 3 seconds, and provides high throughput for prompt restoration of data. In fact, it’s fast enough that many customers simply use it as their only storage tier. Competing systems take 4-5 hours to do the same task, and offer substantially lower throughput, not to mention a confusing, potentially extremely expensive restore fee structure. Ours is simple: 1 penny per GB/month to store, 1 penny per GB to restore. New competitive offers like Standard-IA storage cost more, add weird minimum size restrictions, and fail to deliver a global service. Close, but no cigar!
How does Google Cloud Platform deliver on these features, and why are they so difficult for our competitors to match? Google has built some of the most amazing technology, custom to our specific needs and design, which fills some of the most sophisticated data centers in the world. We have powerful software-defined networks, faster storage components and better storage software, a more nimble hypervisor and some of the finest operations engineers and distributed software developers in the world.
You can’t fake these features, or simply buy the most expensive parts; you have to invest, heavily, in core componentry and skills development to deliver best in class capabilities.
The advantage doesn’t stop there; as it turns out, in most cases and for most products, Google Cloud Platform has a substantial cost advantage, typically on the order of 40% cheaper than competing clouds.
How can a cloud platform be both Better AND Cheaper?
Easy: it turns out that the same technology that you need to create incredible services for your customers, is the same as what we need to deliver incredible services to you. We consume these same resources to deliver not only the basic components of our offering, but importantly the managed services like App Engine, BigQuery, Cloud Bigtable, Cloud Dataflow, Cloud Pub/Sub and more. How do we build a faster cheaper data warehouse like BigQuery? For starters; have container technology which boots quicker, SSD that serves data faster, and a load balancer designed to more efficiently distribute traffic. How do we build an efficient ETL engine like Cloud Dataflow? Well; have an long history in developing distributed processing software like MapReduce, Dremel, and Spanner, deliver it with the most powerful software defined network (Google Cloud Networking) in the world and back it with rock solid storage.
Similarly, our internal operational tools, the monitoring, logging, metering, billing, auditing and forensics infrastructure that allows us to deliver a scaled cloud infrastructure to hundreds of thousands of customers and billions of users all operate dramatically more efficiently because of this foundation. Remember, it’s only a tiny fraction of the cloud which you have access to directly as products and services; the real measure of a cloud is its capacity for efficient scale, and Google has built efficiently at planet scale.
So, it works out that across the board, from instance prices to warehouses, from storage tools to automation engines, we’re able to deliver a really substantial price advantage to all of our customers, even while giving you the best tools in the world to deliver for yours.
But don’t rely on me, the English language makes it easy to say “it’s cheaper!” but math is what proves it. We’ve built several tools to help you make your own analysis of the cost comparison between different clouds, public and private, as well as running a static infrastructure in colocation facilities or your own data centers.
Total Cost of Ownership and Google Cloud Platform
First of those tools is the GCP vs. AWS TCO Tool, a simple web UI for observing how some factors which many customers don’t anticipate in their modeling can have a big impact in real TCO over time. Cost of capital, the consistent downwards trend in infrastructure costs over time, the likelihood of system design change, as well as the value of per-minute versus per-hour billing can often deliver a huge difference in what you’d expect a system to cost. Our model correctly captures these factors (as verified by Independent Analyst firm ESG) and provides an easy to understand comparison.
We’ve even included some pre-configured examples which capture some of the patterns we see play out every day on our infrastructure, which might look similar to the types of systems you’d design. The first, which we call a “mature app”, is designed to reflect a production system, still under development, but already in the hands of customers. It has some development resources, systems which run dev and test workloads which demand bursty, short-lived infrastructure. It also has a production system with a 6:1 day to night diurnal utilization swing (so, if your system needs to run 2 computers at night to serve traffic, in this example you’d run 12 during the day to handle peak load), which is typical of many applications, and has relatively conservative measures for the likelihood of system change, cost of capital, and expected downwards price trajectory. Given these settings, even when using the most efficient combination of AWS Reserved instances, yields a Google Cloud Platform price advantage of 40%.
Some customers are looking to build the next Snapchat, so we’ve included an even more flexible, nimble example of a smaller system called “startup app”. Using this example, the advantages of per-minute billing, and ability to tolerate huge day/night swings drive a Google Cloud Platform price advantage of nearly 60%.
We talk to many customers in enterprise who argue that their systems don’t work like this; that they don’t develop software, they buy it. That they run static infrastructure systems to minimize operational overhead, and that they license software in a fixed way which demands that they avoid this kind of variability in implementation. Surely paying up front to achieve a fixed discount, like AWS Reserved Instances, must save customers following this usage pattern quite a bit over Cloud Platform? We’ve captured this workload as “Static enterprise app” in our TCO tool, and if you take a look, it turns out it doesn’t matter: our lower basic rates, combined with automatic sustained usage discounts erase the price advantage of Reserved Instances. Even in this example, Google Cloud Platform still enjoys a 36% price advantage.
These views are a bit of a summary, but we know folks are eager to dive into additional detail. Linked to the TCO tool is our Google Cloud Platform Pricing Calculator, a simple UI to help you accurately estimate the price you should expect to pay for running applications on our cloud.
If you already know what you’re running somewhere else, and have a good idea of what your fully loaded costs are, try entering those same infrastructure requirements into our Pricing Calculator, I suspect that you’ll come away quite surprised about what your monthly costs would be. (And if you don’t, we sure want to know about it - leave us a comment!)
So, how can you optimize?
Customers often ask me how they can best optimize their systems for cost, how they can save using Google Cloud Platform. In reality, simply implementing applications following basic best practices on Cloud Platform can deliver really substantial cost advantages over more static data center deployments, but often folks expect that there are special tricks to getting a good deal on cloud.
It turns out, most of the time, the real trick is “un-learning” the inefficient behaviors that data centers require to deliver on the needs of business. These inefficient behaviors typically go along the lines of …
Planning on growth? Buy infrastructure months in advance and pre-test it.
Planning on software changes? Radically overprovision hardware and memory “just in case”.
Planning on robust testing? Duplicate the production infrastructure for your test setup.
Planning, at all? Spend cycles in complex estimation rather than improving your product.
For most cloud customers, all of this planning simply gets eliminated; you pay only for what you use, only when you need it, so the work effort is re-oriented around ensuring that your systems accommodate scalability easily enough to nimbly adjust according to real demand. A few examples:
Put work into queues, and autoscale processors against queue depth = never have a machine on that’s not doing productive work!
If your software can’t run in an autoscaler group, it’s a bug!
For internal tool systems, consider a static “holding” page which calls a deployment manager script to start the dynamic hosting system for an app when users visit; if nobody is online, it turns off!
Don’t over-provision instances for projected future demand, pick exactly the size you need now, and scale up or out as your demands grow.
Most DB systems are deeply IO bound; don’t get huge computers when what you really need is huge storage.
While the above might take a bit of tweaking for your software if it’s running in Google Compute Engine, it turns out that lots of Google Cloud Platform services work this way by default. App Engine, BigQuery, Cloud Bigtable, Cloud Dataflow, CloudSQL Part-time instances, and more all do their best to minimize unnecessary resource consumption automatically.
We’re excited about the advantages we’re sharing with customers, the way that they’re building amazing products on top of us, and how those products are challenging and disrupting the status quo for the better. What a great time to build! We can’t wait to see what you build next on Google Cloud Platform.
- Posted by Miles Ward, Global Head of Solutions