Bryan Helmig

Co-founder of Zapier, speaker, musician and builder of things.

Let me preface this post by saying South is awesome. It greatly simplifies schema changes when working with databases with Django. However, if you’ve ever had to do a large data migration, you likely will see South bite the dust. It’s not really made for that. At that point you really need something a little more robust at chewing through large amounts of data.

This is where I like to use Celery. Do your setup schema migrations like normal and write a new task for handling the migration from the old to new table, and then write another migration for running after your Celery tasks complete. Here’s a little better workflow:

  1. Create migration that insert new column(s) or table(s).
  2. Create separate migration that removes old column(s) or tables(s).
  3. Write a task to migrate a discrete chunk of rows.
  4. Run the first migration.
  5. Iterate over discrete chunks of rows (think id range 1-1000, 1001-2000, etc…) and launch tasks.
  6. Wait for tasks to complete.
  7. Run the second migration.

You migrations and tasks will obviously be implementation specific, but I thought I’d share the chunking of data sets that I’ve used.

Basically, I write a task that accepts both `begin` and `end` arguments, filter for those ID ranges, and then generate a bunch of `begin` and `end` pairs. Like so:

from celery.task import task
 
@task()
def sweep_migrate(begin, end):
    from app.models import Model
 
    for instance in Model.objects.filter(id__gt=begin, id__lte=end).iterator():
        # migrate instance
 
def gen_pairs(count, cut):
    """
    Generates a list of [begin, end] pairs for appropriate slicing in
    over massive lists. (mainly for Django QS).
 
    >>> gen_pairs(42, 10)
    >>> [[0, 10], [10, 20], [20, 30], [30, 40], [40, 42]]
    """
    try:
        _pairs = range(count)[cut::cut]
        return [[x-cut, x] for x in _pairs] + [[_pairs[-1], count]]
    except IndexError:
        return [[0, count]]
 
def start_tasks(final_id):
    pairs = gen_pairs(final_id, 1000)
 
    for begin, end in pairs:
        task = sweep_migrate.apply_async(args=[begin, end])

And to launch, I simply find the highest auto increment ID of the set I want to migrate and launch a shell and do something like so:

from app.task import start_tasks
start_tasks(12345678)

Go grab a cup of coffee and wait… 🙂


Posted March 5, 2012 @ 7:19 pm under Work.

Don’t feel like setting up Jenkins you lazy bum? Fine. Try this on for size: use a Github service hook to ping a Django view which runs a bash script out of process. Sound like a bad idea? Probably, but bad is a relative thing you see… here’s how: Gonna need the at command. Do […]

Read more...


Posted December 20, 2011 @ 4:32 pm under Work.

SOAP is a bit foreign to me (JSON + good documentation seems so much easier), but I finally managed to authenticate DocuSign with a SOAP client in Python. The code below assumes you have a developer account all set up and have suds, the Python SOAP library, installed: from suds.client import Client   class DocuSign(Client): […]

Read more...


Posted November 4, 2011 @ 10:36 am under Work.

If you use this, make sure you are PCI compliant, otherwise explore stripe.js… In case you haven’t heard, payment gateways, merchant accounts and all that jazz are now obsolete thanks to Stripe. Stripe offers a simple to set up payment service with an absolutely wonderful API. Instead of comparing and contrasting dozens of merchant accounts […]

Read more...


Posted October 11, 2011 @ 12:14 am under Work.

As my next miniature project will be a crossword puzzle maker (note: domain has been sold to a nice fellow who is maintaining it) for teachers that will make random generation of crossword puzzles and word search puzzles, I thought I’d share the code I developed to create these puzzles on the fly. While I […]

Read more...


Posted April 10, 2010 @ 4:18 pm under Boring Stuff, Work.

Heads up, BitBuffet (the recommended service in this post) is not longer around. However, with a little elbow grease you can use my new startup Zapier to sell files online. There are quite a few services out there that provide a mechanism for digital downloads, most of them are cart based or even store based […]

Read more...


Posted February 15, 2010 @ 5:48 pm under Work.

Finally, after months of tweaking and building, I’ve launched Rankiac.com, a super charged automatic Google rank checker. It’s a dandy little SEO tool that doesn’t do a whole heck of a lot, but what it does, it does well. At the moment, it (1) tracks rankings in Google, (2) watches your important links and (3) […]

Read more...


Posted December 17, 2009 @ 3:55 am under Work.

I run a few websites (lets just say over a dozen) so I generally spend a lot of my time optimizing and tweaking these sites. My first site, a free guitar lesson resource, survives solely off of Adsense. I like Adsense, its easy to use, is extremely popular, and there are is no shortage of […]

Read more...


Posted July 1, 2009 @ 4:27 pm under Work.

Just this weekend, I launched my take on collaborative music online. I am sure there are already sites out there that do this, but I wanted to focus on the layering concept of creating music with a multitrack editor. Anyone who’s ever recorded with a single microphone knows the process of layering subsequent tracks well. […]

Read more...


Posted May 3, 2009 @ 12:36 am under Work.