Over the past 6 months, we've seen 3 significant cloud computing database announcements:
- SimpleDB from Amazon
- BigTable inside Google App Engine
- Microsoft SQL Server Data Services (SSDS)
These cloud databases provide non-relational simple but scalable data services to applications that want to live in a public cloud. While they currently exist in a rapidly expanding hype-bubble, it may well be that cloud databases represent the biggest paradigm shift in DBMS since the relational revolution in the 80s.
Anyway, I was lucky enough to get a Google App Engine account and have been playing with the data access API. I wanted to load up some data I had in Oracle tables and after a bit of mucking around decided I'd write a general purpose loader which others might fine useful.
The loader program is a python script that generates the necessary code
to allow you to use the bulk loader to load up an Oracle
table. You can download the python program here:
Download GAEOraLoad.py (11.3K).
To use it, you’ll need python 2.5 installed with the cx_oracle
installation installed and – of course – you’ll need a Google App
Engine account or at least the GAE SDK.
The program will display a usage message if you run it without arguments. However, as a simple example, you could issue the following command to create a loader for the SCOTT.EMP table:
C:\> python gaeoraload.py emploader scott/tiger EMP
Processing table EMP
Created EmpLoader.py to load EMP
Created EmpQuery.py to query EMP
Created Emp.csv with data from EMP
Issue the following command to load table EMP
bulkload_client.py --filename=Emp.csv --kind=Emp --url=http://localhost:8080/EmpLoad
Created loadAll.bat with commands to load all tables on Windows
Created loadAll.sh with commands to load all tables on *nix
Created loaderclasses.py with python definitions for all Oracle tables
Created app.yaml for application empLoader
The command creates an app.yaml file which defines a new application “emploader”. The application has two entry points; “EmpLoad”, used by the bulkload_client (see http://code.google.com/appengine/articles/bulkload.html) to load the data, and EmpQry which as a convenience displays the contents of the table. The LoadAll.bat or loadAll.sh scripts contains the commands to load the CSV file which contains the data from the EMP table. If you provide a SQL wildcard (eg “%”) then CSV files and entry points for all matching tables are created.
So having generated all that, I can start the application as usual:
C:\> python \tools\google_appengine\dev_appserver.py .
INFO 2008-05-09 14:26:08,125 appcfg.py] Checking for updates to the SDK.
INFO 2008-05-09 14:26:08,845 dev_appserver_main.py] Running application emploader on port 8080: http://localhost:8080
And then I can upload my EMP data:
C:\> python bulkload_client.py --filename=Emp.csv --kind=Emp --url=http://localhost:8080/EmpLoad
INFO 2008-05-09 14:27:47,703 bulkload_client.py] Starting import; maximum 10 entities per post
INFO 2008-05-09 14:27:47,703 bulkload_client.py] Importing 10 entities in 564 bytes
INFO 2008-05-09 14:27:51,414 bulkload_client.py] Importing 4 entities in 217 bytes
INFO 2008-05-09 14:27:52,621 bulkload_client.py] Import successful
Visiting the EmpQry entry point displays the first 100 rows of data which I just loaded (there’s only 14 rows in EMP though):
This works as you expect when you upload the application, though note that there is no security on the application so in theory someone could upload data into your Google App Engine account. Also there are a couple of limitations that I know about:
1. Numeric nulls are not supported by the
bulkloader, so for now I’m setting null values to -1.
2. Unicode characters (I think) are causing the bulk loader to crash.
Now that I can upload Oracle data to the Google App Engine cloud I’m planning to muck about with various query idioms such as joins, aggregates and so on. Should be fun!