Over here at the UCSB Racelab, we've complained endlessly about finding a web framework we actually could use. For a long time we thought we just wouldn't be able to find it - many were so-so or good but only after a substantial learning curve. So imagine our surprise back in April 2008 when we heard about what we thought would be just-another-web-framework provided by Google in the Python version of App Engine. But after giving it a try, we were smitten. We finally found a web framework that (1) we could actually use on non-trivial projects and (2) we could teach in nine-week classes without having students lose half the time with the idiosyncrasies of the programming language involved or the web framework itself. Furthermore, the minimalistic APIs make it simple to get work done: it did for us exactly what we needed and nothing else.
Yet as researchers and hackers-at-heart there was one thing that we really wanted to do with App Engine that we couldn't do: run it on a whole bunch of our machines and tinker with it. A similarly-minded hacker named Chris Anderson had released AppDrop, which was a modified version of the App Engine SDK that hooked up to PostgresSQL and run in Amazon EC2, but only ran over a single machine. So after much discussion, we came up with the following short list of things we wanted to do with App Engine:
So with that in mind, we created AppScale, an open-source cloud platform for Google App Engine applications. Here's how we did it:
We took the standard three-tier web deployment approach and clearly segmented each tier into a specific component in the system: an AppLoadBalancer routes users to their applications, an AppServer runs the user's App Engine app, and an AppDB handles database interactions. Each have clearly defined roles in the system and are controlled by an AppController, a daemon that runs on each machine, monitors each component, and controls the specific order in which services are started. It writes all the configuration files for each service and coordinates services between the other AppControllers in the deployment. For those interested, we detail the specifics on the original AppScale implementation in this paper.
We also wanted to embody the principle of "standing on the shoulders of giants", and as such, we employ open-source software as often as possible, where appropriate. Our AppLoadBalancer employs the nginx web server as well as the haproxy load balancer to ensure high performance. Our Memcache API implementation uses memcache under the hood, while our MapReduce API uses Apache Hadoop, which we added to give App Engine users running over AppScale the ability to run Hadoop MapReduce jobs from within their web applications.
Because we were able to keep the database support abstracted away from the other components in the system, we were able to add support for nine different data storage solutions within AppScale: HBase, Hypertable, MySQL, Cassandra, Voldemort, MongoDB, MemcacheDB, Scalaris, and SimpleDB. Many of these databases have seen interest in recent years but have been hard to measure under comparable conditions, and vary greatly. To give a few examples, they vary in the query languages they provide, their topologies (e.g., master / slave, peer-to-peer), data consistency policies, and end-user library interfaces. This has made it non-trivial for the community to objectively determine scenarios in which one database performs better or worse than another and investigate why, but under AppScale, deploying all these databases is done automatically with no interaction from the user. And because AppScale is open-source, if a developer doesn't like the particular interface we use for a database, they can improve on it and give back to the community. We've used AppScale internally to evaluate the performance of Google App Engine applications on these datastores as well as developed an App Engine app, Active Cloud DB, that exposes a RESTful API that developers can use to access these datastores from any programming language or web framework.
Finally, the most important lesson we learned was the value of incremental development. Our core development team fluctuates between two to three developers, so from the first meeting we had, we knew that our very first release couldn't support every App Engine API nor could it run nine databases seamlessly. Therefore, we started off with support for the two BigTable clones, HBase and Hypertable, as well as support for just the Datastore API, the URL Fetch API, and the Users API within App Engine. From there, we learned what datastores people actually wanted to see support for as well as what APIs people wanted to use. We were also able to add APIs within App Engine apps deployed to AppScale to be able to run virtual machines under the EC2 API, while also running computation under the MapReduce API.
But developing AppScale was certainly not a cakewalk for us. Over the course of the last two years, five major issues (some technical and some not) have arisen within the project:
All of these problems are greatly exacerbated by only having a two-to-three person core developer team, but this also makes the AppScale project particularly interesting to work on. Despite having worked on AppScale for two years, there are still tons of interesting problems to work on and we still love the Python App Engine web framework as much as we did when we first picked it up. And of course, AppScale is open-source, under the New BSD License, so feel free to download it and tinker around like we have! Check out AppScale at:
http://appscale.cs.ucsb.edu
http://code.google.com/p/appscale
Demonstrate your proficiency to design, build and manage solutions on Google Cloud Platform.