Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Understanding Cloud Pricing Part 5 - NoSQL Databases
Monday, August 31, 2015
We’ve had a lot of great responses and feedback (keep ‘em coming!) about our cloud pricing posts (
Local SSDs
,
Virtual Machines
,
Data Warehouses
) and today we’re back to talk about running NoSQL databases in the cloud. Specifically, we want to give you the information you need to understand how to estimate the cost of running NoSQL workloads on
Google Cloud Platform
.
NoSQL Databases
The NoSQL database market has
experienced massive growth
for the last few years and NoSQL databases have been instrumental in solving many distributed data and scaling challenges, which have opened the door for new and innovative applications and solutions. “NoSQL” is an umbrella term that encompasses any data store that fits the notion of “not only SQL” and many products offer a high degree of tunability around the standard relational database concepts of atomicity, consistency, isolation, and durability (see
ACID
for more information) and the distributed systems concepts of consistency, availability, and partition tolerance (see
CAP theorem
for more information). And every NoSQL database offers something different when it comes to how data is modeled and stored - including, but not limited to - JSON document, key-value, wide-column, and blob storage.
As expected, there are several different self-managed options available such as
MongoDB
,
Apache Cassandra
,
Riak
,
Apache CouchDB
,
Couchbase
and many more. Today we’re going to focus on how to estimate pricing when running MongoDB. MongoDB is a document-based, highly-scalable NoSQL database that provides dynamic JSON schemas along with a powerful query language. There are a variety of use cases for MongoDB such as, 360-degree view of the customer, real-time analytics, internet of things applications, and content management (to name a few).
However, when looking at the pricing data for MongoDB, we noticed something interesting. We had planned a separate blog post to talk about pricing Cassandra on Google Cloud Platform as well. But the hardware (virtual or real) requirements are very similar and neither require a license to be purchased, so the costs are very similar. It didn’t make sense to have another post stating more or less the same thing, just replacing the name of the database so we are going to include Cassandra here as well.
Cassandra, unlike MongoDB, is a key-value store. Cassandra was written at Facebook with much of the data model inspired by
Google's Bigtable
white paper and the availability design inspired by
Amazon's Dynamo
white paper. Cassandra was designed for high availability, performance, and tunable consistency. Cassandra has no leader or master node, but rather all the nodes in a cluster exist in a ring, where data is replicated a configurable number of times. Availability comes from having a headless cluster storing your data; tunable consistency comes from how much effort you want your cluster to spend to return your queries. Cassandra and MongoDB are two of the most used NoSQL databases that we see our customers using.
Starting Point
So how do you estimate pricing given multiple use cases and different possible query and traffic patterns? To get started with MongoDB, we’re going to narrow the scope a bit and estimate the costs of the resources used in existing benchmarks. There are several benchmarks that have been published about MongoDB performance and we’ll focus in on two of them, one
published by MongoDB
and another from
United Software Associates
. Both benchmarks reach roughly the same throughput and latency conclusions so this is a reasonable model to build upon.
While the benchmarks from United Software Associates used a single MongoDB node for testing, the benchmarks published by MongoDB used a 3-node replica set.
Replica sets
are a redundant, highly-available deployment of MongoDB and they are strongly recommended for all production workloads (at a minimum). The smallest possible replica set is comprised of three nodes, each configured with matching specifications so we’ll include that configuration in our pricing breakdown below. The on-prem reference hardware specs used in the benchmarks were as follows (MongoDB, like most databases, tends to favor more RAM and storage IOPS where possible):
Benchmark
MongoDB
United Software Associates
CPU
Dual 10-core Xeon 3.0 GHz
Dual 6-core Xeon 3.06 GHz
RAM
128 GB
96 GB
Storage
2 x 960 GB SSD
2 x 960 GB SSD
Monthly Price (single node)
$1,525.00*
(
estimate
)
Unavailable**
Monthly Price (3-node replica set)
$4,575.00*
(
estimate
)
Unavailable**
Now if we map that back to
Google Compute Engine
instances and storage offerings we would have the following 2 closely matching configurations along with pricing:
Instance Type
n1-highmem-16
n1-standard-32
CPU
16 Xeon vCPU
32 Xeon vCPU
RAM
104 GB
120 GB
Storage
4 x 375 GB Local SSD
4 x 375 GB Local SSD
Monthly Price (single node)
$843.60
$1,146.10
Monthly Price (3-node replica set)
$2,530.76
(
estimate
)
$3,438.30
(
estimate
)
Monthly Price Difference
44%
24%
Annual Savings vs. On-Premise
$24,530.88
$13,640.40
The cost breakdown above shows the pricing for a single node and for a 3-node replica set, which is a typical production deployment of MongoDB as stated above. We selected Local SSD for the storage layer in order to support the IOPS required for the throughput metrics achieved in the benchmark reports. As shown in this
disk type comparison
, Local SSD can support up to 280,000 write IOPS per instance. We know that Local SSD is ephemeral storage, meaning that its lifecycle is tied to the virtual machine to which it is mounted, which is another reason why we chose to estimate pricing for the highly available MongoDB 3-node replica set option. Finally, the prices shown above include Google Cloud Platform
sustained use discounts
which totals about a 30% discount over the course of the month.
The pricing for Cassandra is pretty similar to MongoDB. They both benefit from Local SSD in terms of performance. And the trade-off between more memory (n1-highmem-16) and more compute (n1-standard-32) is the type of choice that DBAs will have to make when designing a typical Cassandra cluster. Of course, this is just guidance on pricing to get you started, you won't know what's best for your application until you actually run tests yourself.
Running Your Own Tests
As with any benchmarks, your mileage may vary when testing your particular workloads. Isolated tests run during benchmarks don’t always equate to real world performance so it is important that you run your own tests and assess read-write performance for a workload that closely matches your usage. Take a look at
PerfKit
and use to it to profile your own proposed deployments, including mixing and matching workloads or worker counts.
Pricing NoSQL workloads can be somewhat challenging but hopefully we’ve given you a way to get started in estimating your costs. If you’re interested in learning more about compute and storage on Google Cloud Platform, check out
Google Compute Engine
or take a look at the
documentation
. Feedback is always welcome so if you’ve got comments or questions, don’t hesitate to let us know in the comments.
We’ve gotten a lot of great feedback about this post, and we wanted to let you know that we will also be posting about cloud pricing for Google Cloud Platform's managed NoSQL options in the near future. In forthcoming blog posts, we’ll talk about how to understand the pricing around Google Cloud Bigtable and Google Cloud Datastore and compare those to other popular managed offerings. Thanks for the questions and comments, keep ‘em coming!
- Posted by Sandeep Parikh and Peter-Mark Verwoerd, Solutions Architects
* -
Price was taken from a configure-to-order bare metal server at
Softlayer
** -
Configuration was unavailable to estimate the monthly price
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Firebase Blog
Apigee Blog
Popular Posts
World's largest event dataset now publicly available in BigQuery
A look inside Google’s Data Center Networks
Enter the Andromeda zone - Google Cloud Platform’s latest networking stack
Using labels to organize Google Cloud Platform resources
New in Google Cloud Storage: auto-delete, regional buckets and faster uploads
Labels
Announcements
193
Big Data & Machine Learning
134
Compute
271
Containers & Kubernetes
92
CRE
27
Customers
107
Developer Tools & Insights
151
Events
38
Infrastructure
44
Management Tools
87
Networking
43
Open
1
Open Source
135
Partners
102
Pricing
28
Security & Identity
85
Solutions
24
Stackdriver
24
Storage & Databases
164
Weekly Roundups
20
Feed
Subscribe by email
Demonstrate your proficiency to design, build and manage solutions on Google Cloud Platform.
Learn More
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow