Recently I’ve been putting some time into writing a database adapter for Django that uses Amazon’s S3 and SimpleDB services as a storage layer, whilst trying to retain as much of Django’s QuerySet functional layer as possible. The general goal is to provide a storage back-end for Django that isn’t dependent on the traditional vertically-scaling database server, but can scale horizontally in the same way as the EC2 computing cloud does. My eventual goal being the ability to deploy Django in the cloud with no external dependencies. Just throw out a Django machine image, deploy your app’s code and config, and you have a scaling solution that takes minutes rather than days or weeks.
It’s a non-trivial exercise that is both stimulating and frustrating in equal measure, and progress has been steady, if not exactly rapid. It’s worth it to me though, as the ability to roll out scaling infrastructure is dramatically hampered by the database layer.
Imagine my delight then to find that Google have launched AppEngine, their own cloud-based web application system. It’s Python without any messy machine-based libraries, uses WSGI so you can use pretty much any Python web app, and with GFS for a distributed file storage and BigTable as a data persistence layer. Google even throws in Django 0.96.1 with instructions on how to use their storage layers by doing away with Django’s own model (more on this later).
There’s a lot of whining about how Google’s solution cripples Python (which is crazy when you look at how trivial it is to refactor code to use Google’s supplied alternatives), and locks you into their solution. I suspect that this is mostly from people who have never even contemplated building an application that needs to really scale, and are therefore still thinking in terms services provided by the underlying OS. That’s a big problem for scaling, because disk, IO, threads, sockets, etc are finite resources that are hardware-bound. Abstracting access to these things is tough. Most scaling solutions these days are about providing multiple hardware instances, but unfortunately that only solves the hardware problem. Building an app that scales transparently over multiple hardware instances is a huge challenge in comparison to procuring more servers.
Google’s approach is to do away with the concept of hardware entirely. That means a change of mindset towards every request being an atomic operation. Persistence occurs (correctly) in your persistence layer and not in transient storage available to an instance of your application. Google have provided extensive Python libraries and API calls to enable applications to take advantage of this, but it seems that a fairly vocal group aren’t interested unless their applications work on AppEngine without any additional effort. Considering the paradigm shift that AppEngine represents (from machine-centric programming to distributed programming) it’s not unreasonable to expect some small effort to be required. Especially when you take into account that AppEngine is currently in a very limited trial phase.
I’m extremely optimistic that Google’s approach will work well for a number of reasons. As an application programmer I spend huge amounts of time working around hardware and platform limitations that I should be spending on core functional areas. If Google can provide a solution that means I never have to worry about specific hardware problems ever again, I doubt I’ll look back.