Over here at the UCSB Racelab, we've complained endlessly about finding a web framework we actually could use. For a long time we thought we just wouldn't be able to find it - many were so-so or good but only after a substantial learning curve. So imagine our surprise back in April 2008 when we heard about what we thought would be just-another-web-framework provided by Google in the Python version of App Engine. But after giving it a try, we were smitten. We finally found a web framework that (1) we could actually use on non-trivial projects
and (2) we could teach in nine-week classes without having students lose half the time with the idiosyncrasies of the programming language involved or the web framework itself. Furthermore, the minimalistic APIs make it simple to get work done: it did for us exactly what we needed and nothing else.
Yet as researchers and hackers-at-heart there was one thing that we really wanted to do with App Engine that we couldn't do: run it on a whole bunch of our machines and tinker with it. A similarly-minded hacker named Chris Anderson had released AppDrop, which was a modified version of the App Engine SDK that hooked up to PostgresSQL and run in Amazon EC2, but only ran over a single machine. So after much discussion, we came up with the following short list of things we wanted to do with App Engine:
So with that in mind, we created AppScale, an open-source cloud platform
for Google App Engine applications. Here's how we did it:
We took the standard three-tier web deployment approach and clearly segmented each tier into a specific component in the system: an AppLoadBalancer routes users to their applications, an AppServer runs the user's App Engine app, and an AppDB handles database interactions. Each have clearly defined roles in the system and are controlled by an AppController, a daemon that runs on each machine, monitors each component, and controls the specific order in which services are started. It writes all the configuration files for each service and coordinates services between the other AppControllers in the deployment. For those interested, we detail the specifics on the original AppScale implementation in this paper.
We also wanted to embody the principle of "standing on the shoulders of giants", and as such, we employ open-source software as often as possible, where appropriate. Our AppLoadBalancer employs the nginx web server as well as the haproxy load balancer to ensure high performance. Our Memcache API implementation uses memcache under the hood, while our MapReduce API uses Apache Hadoop, which we added to give App Engine
users running over AppScale the ability to run Hadoop MapReduce jobs from within their web applications.
Because we were able to keep the database support abstracted away from the other components in the system, we were able to add support for nine different data storage solutions within AppScale: HBase, Hypertable,
MySQL, Cassandra, Voldemort, MongoDB, MemcacheDB, Scalaris, and SimpleDB. Many of these databases have seen interest in recent years but have been hard to measure under comparable conditions, and vary greatly. To give a few examples, they vary in the query languages they provide, their topologies (e.g., master / slave, peer-to-peer), data consistency policies, and end-user library interfaces. This has made it non-trivial for the community to objectively determine scenarios in which one database performs better or worse than another and investigate why, but under AppScale, deploying all these databases is done automatically with no interaction from the user. And because AppScale is open-source, if a developer doesn't like the particular interface we use for a database, they can improve on it and give back to the community. We've used
AppScale internally to evaluate the performance of Google App Engine applications on these datastores as well as developed an App Engine app, Active Cloud DB, that exposes a RESTful API that developers can use
to access these datastores from any programming language or web framework.
Finally, the most important lesson we learned was the value of incremental development. Our core development team fluctuates between two to three developers, so from the first meeting we had, we knew that
our very first release couldn't support every App Engine API nor could it run nine databases seamlessly. Therefore, we started off with support for the two BigTable clones, HBase and Hypertable, as well as support
for just the Datastore API, the URL Fetch API, and the Users API within App Engine. From there, we learned what datastores people actually wanted to see support for as well as what APIs people wanted to use. We
were also able to add APIs within App Engine apps deployed to AppScale to be able to run virtual machines under the EC2 API, while also running computation under the MapReduce API.
But developing AppScale was certainly not a cakewalk for us. Over the course of the last two years, five major issues (some technical and some not) have arisen within the project:
All of these problems are greatly exacerbated by only having a two-to-three person core developer team, but this also makes the AppScale project particularly interesting to work on. Despite having worked on AppScale for two years, there are still tons of interesting problems to work on and we still love the Python App Engine web framework as much as we did when we first picked it up. And of course, AppScale is open-source, under the New BSD License, so feel free to download it and tinker around like we have! Check out AppScale at:
Today, we’re releasing version 1.3.8 of the App Engine SDK. Whether you’re a Java or a Python developer, this release includes several exciting new features for improving monitoring, performance, and maintenance tasks.
This release includes a new page in the Admin Console, called the Instances page. This page allows you to view information about all server instances currently in use by your application. This information can be useful in debugging your application and also understanding its performance characteristics. There’s no configuration needed for this feature. Just click the “Instances” link on the left hand navigation of the Admin Console to see Average QPS, latency, and memory for an instance.
This release also has a couple new Task Queue features: First, the maximum bucket size that you can specify during queue configuration is now 100, up from 50. Second, we’ve added a new "Run Now" button to the Task Queues section of the Admin Console that enables developers to run a task immediately. This can be very helpful for debugging your tasks in production.
This release contains a new feature for Python apps: builtin handlers that allow you to quickly and easily enable standard functionality in your application without adding additional code to your codebase. The libraries available today are remote_api, appstats, and the datastore_admin feature (see below). For example, to use the remote_api with your application, simply add the following to your app.yaml file:
- remote_api: on
If you are already using the remote api endpoint your app, you can choose to remove the entry in the handlers section of your app.yaml and use the above directive instead to simplify your app.yaml file.
Support for builtin handlers is not yet available for Java applications, but will be available in an upcoming release.
Note: this feature is currently only available by default for Python; see the note below for ways to use it with Java application.
Today, we are releasing an experimental addition to the admin console which provides a simple UI for delete all entities, or all entities of a given kind, in your datastore. To enable this functionality, simply enable the following builtin in your app.yaml file:
- datastore_admin: on
Adding these lines to app.yaml enables the “Datastore Admin” page in your app’s Admin Console, where you can see all of the entity types you are able to delete:
Be aware that these deletes are issued by your application (you can read about how the handler works by looking at this code file in the SDK). For this reason, your application will use resources, most significantly CPU, for the deletions you issue which will count towards your application’s daily resource budget.
Datastore delete is currently available only with the Python runtime. Java applications, however, can still take advantage of this feature by creating a non-default Python application version that enables Datastore Admin in the app.yaml. Native support for Java will be included in an upcoming release.
Finally, the python pre-compilation feature we announced in 1.3.5 is now turned on for all new python application uploads using the 1.3.8 SDK by default. If you wish to disable this feature, just specify the flag --no-precompilation on the appcfg.py command line when uploading your app.
This release also contains a few more small features and bug fixes. You can read about the full release in our release notes in Python and Java. As always, your feedback in the forums is appreciated (and had a significant influence on this release!).