Google Cloud Platform Blog
Product updates, customer stories, and tips and tricks on Google Cloud Platform
Large Akka Cluster on Google Compute Engine
Wednesday, January 22, 2014
Our guest blog post today comes from Patrik Nordwall, Senior Software Engineer at Typesafe. His areas of expertise include Scala, Akka, and how to build reactive applications.
Akka
is a toolkit developed by
Typesafe
and based on the
Actor Model
for building highly concurrent, distributed and fault tolerant applications on the Java Virtual Machine. We have been working together with the Google Cloud Platform team to test our Akka Cluster on
Google Compute Engine
. The goal was to push our toolkit to its limits and gain insight into how to scale systems on this platform. The results are jaw-dropping—reaching 2400 nodes as well as starting up a 1000 node cluster in just over four minutes. We learned in just a few minutes that Akka on Compute Engine is a great combination for elastic application deployments.
Our impression of Google Compute Engine is that everything just works, which is more than you typically expect from an IaaS. The Google Compute Engine is easy to operate and understand the tools and APIs; it features great stability, and the speed of starting new instances is outstanding, allowing us to SSH into them only 10 seconds after spawning them.
Running a 2400 Akka Nodes Cluster
The test was performed by first starting 1500 instances in batches of 20 every 3 minutes. Each instance was hosting one JVM running a
small test application
(see below for details), which joined the cluster.
Akka Cluster
is a decentralized peer-to-peer cluster that is using a gossip protocol to spread membership changes. Joining a node involves two consecutive state changes that are spread to all members in the cluster one after the other. We measured the time it took from initiating the join until all nodes have seen the new node as a full member.
As can be seen in the above chart, it typically takes 15 to 30 seconds to add 20 nodes. Note that this is the duration until the cluster has a consistent view of the new member, without any central coordination. Nodes were added slowly—stretching the process over a total period of four hours—to also verify the cluster’s stability over time with repeated membership changes.
The time to join increases with the size of the cluster, but not drastically; the theoretical expectation would be logarithmic behavior. For our implementation this holds only up to 500 nodes because we gradually reduce the bias of gossip peer selection when the cluster grows beyond that size.
During periods without cluster membership changes the network traffic of 1500 nodes amounted to around 8 kB/s aggregated average input and output on each node. The average CPU utilization across all nodes was around 10%.
1500 nodes is a fantastic result, and to be honest far beyond our expectations. Our instance quota was 1500 instances, but why not continue and add more JVMs on the running instances until it breaks? This worked up to 2400 nodes, but after that the Akka cluster finally broke down:
when adding more nodes we observed long garbage collection pauses and many nodes were marked as unreachable and did not come back. This was the limit of the Akka cluster software, and not a limit of the Google Compute Engine in itself. To our current knowledge this is not a hard limit, which means that we will eventually overcome it.
Starting Up 1000 Nodes
In the first test the nodes were added slowly to also verify the cluster’s stability. Bulk additions are a more typical use case when starting up a fresh cluster. We also tested how long time it took to get an Akka cluster running across 1000 Google Compute Engine instances, hosting one cluster node each.
It took 4 minutes and 15 seconds from starting the script until all 1000 Akka cluster members were reported as full members and seen by all other nodes.
That measurement also includes the time it takes to start the actual Google Compute Engine instances, which is just mind-boggling—cloud elasticity taken to its extreme. The Google Compute Engine is a perfect environment for Akka Cluster and its ability to scale up and down automatically as nodes are spun up or shut down.
Testing Details
The revision of the used test application in the
akka/apps
repository is
f86ba8db18
. It is using snapshot 2.3-20131025-230950 of Akka. For information on how to run Akka Compute Engine, you can
read my post
on Typesafe.com
We used the Oracle JVM, Java version 1.7.0_40 with 1538 MB heap and ParallelGC.
The Google Compute Engine instances used in this test were of type n1-standard-2 in zone europe-west1-a, with Debian Linux (debian-7-wheezy-v20130816). It is priced at $0.228/h, has 7.5 GB memory, and 5.5 GCEU.
We would like to send a big thank you to Google for making it possible to conduct these tests. Overall, it made Akka a better product. Google Compute Engine is an impressive infrastructure as a service, with the stability, performance and elasticity that you need for high-end Akka systems.
-Contributed by Patrik Nordwall, Senior Software Engineer, Typesafe
Free Trial
GCP Blogs
Big Data & Machine Learning
Kubernetes
GCP Japan Blog
Firebase Blog
Apigee Blog
Popular Posts
Understanding Cloud Pricing
World's largest event dataset now publicly available in BigQuery
A look inside Google’s Data Center Networks
Enter the Andromeda zone - Google Cloud Platform’s latest networking stack
Getting your data on, and off, of Google App Engine
Labels
Announcements
193
Big Data & Machine Learning
134
Compute
271
Containers & Kubernetes
92
CRE
27
Customers
107
Developer Tools & Insights
151
Events
38
Infrastructure
44
Management Tools
87
Networking
43
Open
1
Open Source
135
Partners
102
Pricing
28
Security & Identity
85
Solutions
24
Stackdriver
24
Storage & Databases
164
Weekly Roundups
20
Feed
Subscribe by email
Demonstrate your proficiency to design, build and manage solutions on Google Cloud Platform.
Learn More
Technical questions? Check us out on
Stack Overflow
.
Subscribe to
our monthly newsletter
.
Google
on
Follow @googlecloud
Follow
Follow