Let me preface this post by saying South is awesome. It greatly simplifies schema changes when working with databases with Django. However, if you’ve ever had to do a large data migration, you likely will see South bite the dust. It’s not really made for that. At that point you really need something a little more robust at chewing through large amounts of data.
This is where I like to use Celery. Do your setup schema migrations like normal and write a new task for handling the migration from the old to new table, and then write another migration for running after your Celery tasks complete. Here’s a little better workflow:
- Create migration that insert new column(s) or table(s).
- Create separate migration that removes old column(s) or tables(s).
- Write a task to migrate a discrete chunk of rows.
- Run the first migration.
- Iterate over discrete chunks of rows (think id range 1-1000, 1001-2000, etc…) and launch tasks.
- Wait for tasks to complete.
- Run the second migration.
You migrations and tasks will obviously be implementation specific, but I thought I’d share the chunking of data sets that I’ve used.
Basically, I write a task that accepts both `begin` and `end` arguments, filter for those ID ranges, and then generate a bunch of `begin` and `end` pairs. Like so:
from celery.task import task
def sweep_migrate(begin, end):
from app.models import Model
for instance in Model.objects.filter(id__gt=begin, id__lte=end).iterator():
# migrate instance
def gen_pairs(count, cut):
Generates a list of [begin, end] pairs for appropriate slicing in
over massive lists. (mainly for Django QS).
>>> gen_pairs(42, 10)
>>> [[0, 10], [10, 20], [20, 30], [30, 40], [40, 42]]
_pairs = range(count)[cut::cut]
return [[x-cut, x] for x in _pairs] + [[_pairs[-1], count]]
return [[0, count]]
pairs = gen_pairs(final_id, 1000)
for begin, end in pairs:
task = sweep_migrate.apply_async(args=[begin, end])
And to launch, I simply find the highest auto increment ID of the set I want to migrate and launch a shell and do something like so:
from app.task import start_tasks
Go grab a cup of coffee and wait… 🙂
Don’t feel like setting up Jenkins you lazy bum? Fine. Try this on for size: use a Github service hook to ping a Django view which runs a bash script out of process. Sound like a bad idea? Probably, but bad is a relative thing you see… here’s how: Gonna need the at command. Do […]
SOAP is a bit foreign to me (JSON + good documentation seems so much easier), but I finally managed to authenticate DocuSign with a SOAP client in Python. The code below assumes you have a developer account all set up and have suds, the Python SOAP library, installed: from suds.client import Client class DocuSign(Client): […]
If you use this, make sure you are PCI compliant, otherwise explore stripe.js… In case you haven’t heard, payment gateways, merchant accounts and all that jazz are now obsolete thanks to Stripe. Stripe offers a simple to set up payment service with an absolutely wonderful API. Instead of comparing and contrasting dozens of merchant accounts […]
Call me a sucker, but I love a good server setup as much as the next guy, I just have a little trouble setting it up sometimes. So I thought I’d walk through my process for setting up all this Django goodness on what is basically a LAMP setup (where “P” stands for Python!) with […]
As my next miniature project will be a crossword puzzle maker (note: domain has been sold to a nice fellow who is maintaining it) for teachers that will make random generation of crossword puzzles and word search puzzles, I thought I’d share the code I developed to create these puzzles on the fly. While I […]
Heads up, BitBuffet (the recommended service in this post) is not longer around. However, with a little elbow grease you can use my new startup Zapier to sell files online. There are quite a few services out there that provide a mechanism for digital downloads, most of them are cart based or even store based […]
Finally, after months of tweaking and building, I’ve launched Rankiac.com, a super charged automatic Google rank checker. It’s a dandy little SEO tool that doesn’t do a whole heck of a lot, but what it does, it does well. At the moment, it (1) tracks rankings in Google, (2) watches your important links and (3) […]
I run a few websites (lets just say over a dozen) so I generally spend a lot of my time optimizing and tweaking these sites. My first site, a free guitar lesson resource, survives solely off of Adsense. I like Adsense, its easy to use, is extremely popular, and there are is no shortage of […]
Just this weekend, I launched my take on collaborative music online. I am sure there are already sites out there that do this, but I wanted to focus on the layering concept of creating music with a multitrack editor. Anyone who’s ever recorded with a single microphone knows the process of layering subsequent tracks well. […]