Bryan Helmig

Co-founder of Zapier, speaker, musician and builder of things.

As my next miniature project will be a crossword puzzle maker (note: domain has been sold to a nice fellow who is maintaining it) for teachers that will make random generation of crossword puzzles and word search puzzles, I thought I’d share the code I developed to create these puzzles on the fly. While I was working on it, I ran across many different scripts to accomplish this, but none of them were in my most favorite of languages: Python. Besides, I’d like the code to fit snugly in my web framework of choice: Django; the popular PHP version just wouldn’t cut it. Anyways, scroll down to see the code, or read on for a little primer about the process behind it.

Released under a BSD license. Copyright Bryan Helmig 2010.

Puzzles like these:

p u m p e r n i c k e l -    p u m p e r n i c k e l v
a - - - - - - - a - - e -    a w j m p c a y a w r e s
l - s n i c k e r - - a -    l f s n i c k e r b z a x
a - a - - - - - a - - v -    a f a z k e u i a b f v k
d - f - c - - - m - - e -    d x f v c j f d m c n e x
i - f j o r d - e - - n -    i d f j o r d z e j g n z
n - r - d - - - l i p - -    n r r x d j a o l i p d j
- c o r a l - - - - i - -    i c o r a l u s t o i x w
- - n - - i - - - - s - -    m r n u e i i h o t s y w
- - - - - m i s t - t - -    m w e x s m i s t r t u j
p l a g u e - - - - o - -    p l a g u e b n h k o m s
- - - - - - - d a w n - -    f m n v j f p d a w n c q
- - - - - - - - - - - - -    m h j a e d p p r g t p j

Behind the Scenes

This program is actually very simple and creates completely random crosswords on the fly. Naturally, the more words you have, the better it will be at placing the most possible on a board. However, increasing the number of words will increase computation time. Additionally, increasing the board size will severely increase computation time. To counteract the fact that sometimes it will randomly generate a sub-par board, we will generate many different boards in an allotted time and only keep the “best” board (in this case, the board with the most words placed). So, as the board and word list gets bigger, the number of prospective boards created decreases within a fixed time.

The code first randomizes the word list and then sorts by word length. The idea here is that longer words are more difficult to place, so get them placed when the board is the most open. Next, we place the longest word on the 1, 1 coordinate of the grid as the seed. In tests, the placement of the first word at 1, 1 yielded by far the best results on average. Then we go to the next longest word and loop over its letters and each cell in the grid. When we find a match, we back it up and suggest a coordinate placement for that word. Once we’ve checked every letter against every cell, we chose the best (the word best is used very loosely here) coordinate and apply the word to the grid. Now we move on the next word and so forth. Once we’ve made it through once, we can loop over the unplaced words and looks for any lucky chances for a second placement.

This suggested coordinate system allows for a much faster fit than some methods I’ve seen that will randomly place a word to see if it works. Additionally, it requires the word cross other words which is the point of well, a crossword puzzle.

Operation

Be mindful when you create a word list to exclude words like “an” or “or” because these have a tendency to be placed inside other already placed words. This can be confusing. Simply run the code below.

You can feed the Crossword class a list of Word classes, or a list of tuples or lists with the word and clue. Either way works.

When you call the compute_crossword(seconds) method, it does all the work of computing the best crossword in however many seconds you passed. 1 second is probably enough for crossword grids of less that 20×20 and 2 seconds is fine for 25×25 and 3 seconds is good for 30×30. Additionally, if you have a massive word list, you may want to double the time alloyed. Finally, if you can’t run psycho, quadruple these times for similar quality.

The Code:

import random, re, time, string
from copy import copy as duplicate
 
# optional, speeds up by a factor of 4
import psyco
psyco.full()
 
class Crossword(object):
    def __init__(self, cols, rows, empty = '-', maxloops = 2000, available_words=[]):
        self.cols = cols
        self.rows = rows
        self.empty = empty
        self.maxloops = maxloops
        self.available_words = available_words
        self.randomize_word_list()
        self.current_word_list = []
        self.debug = 0
        self.clear_grid()
 
    def clear_grid(self): # initialize grid and fill with empty character
        self.grid = []
        for i in range(self.rows):
            ea_row = []
            for j in range(self.cols):
                ea_row.append(self.empty)
            self.grid.append(ea_row)
 
    def randomize_word_list(self): # also resets words and sorts by length
        temp_list = []
        for word in self.available_words:
            if isinstance(word, Word):
                temp_list.append(Word(word.word, word.clue))
            else:
                temp_list.append(Word(word[0], word[1]))
        random.shuffle(temp_list) # randomize word list
        temp_list.sort(key=lambda i: len(i.word), reverse=True) # sort by length
        self.available_words = temp_list
 
    def compute_crossword(self, time_permitted = 1.00, spins=2):
        time_permitted = float(time_permitted)
 
        count = 0
        copy = Crossword(self.cols, self.rows, self.empty, self.maxloops, self.available_words)
 
        start_full = float(time.time())
        while (float(time.time()) - start_full) < time_permitted or count == 0: # only run for x seconds
            self.debug += 1
            copy.current_word_list = []
            copy.clear_grid()
            copy.randomize_word_list()
            x = 0
            while x < spins: # spins; 2 seems to be plenty
                for word in copy.available_words:
                    if word not in copy.current_word_list:
                        copy.fit_and_add(word)
                x += 1
            #print copy.solution()
            #print len(copy.current_word_list), len(self.current_word_list), self.debug
            # buffer the best crossword by comparing placed words
            if len(copy.current_word_list) > len(self.current_word_list):
                self.current_word_list = copy.current_word_list
                self.grid = copy.grid
            count += 1
        return
 
    def suggest_coord(self, word):
        count = 0
        coordlist = []
        glc = -1
        for given_letter in word.word: # cycle through letters in word
            glc += 1
            rowc = 0
            for row in self.grid: # cycle through rows
                rowc += 1
                colc = 0
                for cell in row: # cycle through  letters in rows
                    colc += 1
                    if given_letter == cell: # check match letter in word to letters in row
                        try: # suggest vertical placement 
                            if rowc - glc > 0: # make sure we're not suggesting a starting point off the grid
                                if ((rowc - glc) + word.length) <= self.rows: # make sure word doesn't go off of grid
                                    coordlist.append([colc, rowc - glc, 1, colc + (rowc - glc), 0])
                        except: pass
                        try: # suggest horizontal placement 
                            if colc - glc > 0: # make sure we're not suggesting a starting point off the grid
                                if ((colc - glc) + word.length) <= self.cols: # make sure word doesn't go off of grid
                                    coordlist.append([colc - glc, rowc, 0, rowc + (colc - glc), 0])
                        except: pass
        # example: coordlist[0] = [col, row, vertical, col + row, score]
        #print word.word
        #print coordlist
        new_coordlist = self.sort_coordlist(coordlist, word)
        #print new_coordlist
        return new_coordlist
 
    def sort_coordlist(self, coordlist, word): # give each coordinate a score, then sort
        new_coordlist = []
        for coord in coordlist:
            col, row, vertical = coord[0], coord[1], coord[2]
            coord[4] = self.check_fit_score(col, row, vertical, word) # checking scores
            if coord[4]: # 0 scores are filtered
                new_coordlist.append(coord)
        random.shuffle(new_coordlist) # randomize coord list; why not?
        new_coordlist.sort(key=lambda i: i[4], reverse=True) # put the best scores first
        return new_coordlist
 
    def fit_and_add(self, word): # doesn't really check fit except for the first word; otherwise just adds if score is good
        fit = False
        count = 0
        coordlist = self.suggest_coord(word)
 
        while not fit and count < self.maxloops:
 
            if len(self.current_word_list) == 0: # this is the first word: the seed
                # top left seed of longest word yields best results (maybe override)
                vertical, col, row = random.randrange(0, 2), 1, 1
                ''' 
                # optional center seed method, slower and less keyword placement
                if vertical:
                    col = int(round((self.cols + 1)/2, 0))
                    row = int(round((self.rows + 1)/2, 0)) - int(round((word.length + 1)/2, 0))
                else:
                    col = int(round((self.cols + 1)/2, 0)) - int(round((word.length + 1)/2, 0))
                    row = int(round((self.rows + 1)/2, 0))
                # completely random seed method
                col = random.randrange(1, self.cols + 1)
                row = random.randrange(1, self.rows + 1)
                '''
 
                if self.check_fit_score(col, row, vertical, word): 
                    fit = True
                    self.set_word(col, row, vertical, word, force=True)
            else: # a subsquent words have scores calculated
                try: 
                    col, row, vertical = coordlist[count][0], coordlist[count][1], coordlist[count][2]
                except IndexError: return # no more cordinates, stop trying to fit
 
                if coordlist[count][4]: # already filtered these out, but double check
                    fit = True 
                    self.set_word(col, row, vertical, word, force=True)
 
            count += 1
        return
 
    def check_fit_score(self, col, row, vertical, word):
        '''
        And return score (0 signifies no fit). 1 means a fit, 2+ means a cross.
 
        The more crosses the better.
        '''
        if col < 1 or row < 1:
            return 0
 
        count, score = 1, 1 # give score a standard value of 1, will override with 0 if collisions detected
        for letter in word.word:            
            try:
                active_cell = self.get_cell(col, row)
            except IndexError:
                return 0
 
            if active_cell == self.empty or active_cell == letter:
                pass
            else:
                return 0
 
            if active_cell == letter:
                score += 1
 
            if vertical:
                # check surroundings
                if active_cell != letter: # don't check surroundings if cross point
                    if not self.check_if_cell_clear(col+1, row): # check right cell
                        return 0
 
                    if not self.check_if_cell_clear(col-1, row): # check left cell
                        return 0
 
                if count == 1: # check top cell only on first letter
                    if not self.check_if_cell_clear(col, row-1):
                        return 0
 
                if count == len(word.word): # check bottom cell only on last letter
                    if not self.check_if_cell_clear(col, row+1): 
                        return 0
            else: # else horizontal
                # check surroundings
                if active_cell != letter: # don't check surroundings if cross point
                    if not self.check_if_cell_clear(col, row-1): # check top cell
                        return 0
 
                    if not self.check_if_cell_clear(col, row+1): # check bottom cell
                        return 0
 
                if count == 1: # check left cell only on first letter
                    if not self.check_if_cell_clear(col-1, row):
                        return 0
 
                if count == len(word.word): # check right cell only on last letter
                    if not self.check_if_cell_clear(col+1, row):
                        return 0
 
 
            if vertical: # progress to next letter and position
                row += 1
            else: # else horizontal
                col += 1
 
            count += 1
 
        return score
 
    def set_word(self, col, row, vertical, word, force=False): # also adds word to word list
        if force:
            word.col = col
            word.row = row
            word.vertical = vertical
            self.current_word_list.append(word)
 
            for letter in word.word:
                self.set_cell(col, row, letter)
                if vertical:
                    row += 1
                else:
                    col += 1
        return
 
    def set_cell(self, col, row, value):
        self.grid[row-1][col-1] = value
 
    def get_cell(self, col, row):
        return self.grid[row-1][col-1]
 
    def check_if_cell_clear(self, col, row):
        try:
            cell = self.get_cell(col, row)
            if cell == self.empty: 
                return True
        except IndexError:
            pass
        return False
 
    def solution(self): # return solution grid
        outStr = ""
        for r in range(self.rows):
            for c in self.grid[r]:
                outStr += '%s ' % c
            outStr += '\n'
        return outStr
 
    def word_find(self): # return solution grid
        outStr = ""
        for r in range(self.rows):
            for c in self.grid[r]:
                if c == self.empty:
                    outStr += '%s ' % string.lowercase[random.randint(0,len(string.lowercase)-1)]
                else:
                    outStr += '%s ' % c
            outStr += '\n'
        return outStr
 
    def order_number_words(self): # orders words and applies numbering system to them
        self.current_word_list.sort(key=lambda i: (i.col + i.row))
        count, icount = 1, 1
        for word in self.current_word_list:
            word.number = count
            if icount < len(self.current_word_list):
                if word.col == self.current_word_list[icount].col and word.row == self.current_word_list[icount].row:
                    pass
                else:
                    count += 1
            icount += 1
 
    def display(self, order=True): # return (and order/number wordlist) the grid minus the words adding the numbers
        outStr = ""
        if order:
            self.order_number_words()
 
        copy = self
 
        for word in self.current_word_list:
            copy.set_cell(word.col, word.row, word.number)
 
        for r in range(copy.rows):
            for c in copy.grid[r]:
                outStr += '%s ' % c
            outStr += '\n'
 
        outStr = re.sub(r'[a-z]', ' ', outStr)
        return outStr
 
    def word_bank(self): 
        outStr = ''
        temp_list = duplicate(self.current_word_list)
        random.shuffle(temp_list) # randomize word list
        for word in temp_list:
            outStr += '%s\n' % word.word
        return outStr
 
    def legend(self): # must order first
        outStr = ''
        for word in self.current_word_list:
            outStr += '%d. (%d,%d) %s: %s\n' % (word.number, word.col, word.row, word.down_across(), word.clue )
        return outStr
 
class Word(object):
    def __init__(self, word=None, clue=None):
        self.word = re.sub(r'\s', '', word.lower())
        self.clue = clue
        self.length = len(self.word)
        # the below are set when placed on board
        self.row = None
        self.col = None
        self.vertical = None
        self.number = None
 
    def down_across(self): # return down or across
        if self.vertical: 
            return 'down'
        else: 
            return 'across'
 
    def __repr__(self):
        return self.word
 
### end class, start execution
 
#start_full = float(time.time())
 
word_list = ['saffron', 'The dried, orange yellow plant used to as dye and as a cooking spice.'], \
    ['pumpernickel', 'Dark, sour bread made from coarse ground rye.'], \
    ['leaven', 'An agent, such as yeast, that cause batter or dough to rise..'], \
    ['coda', 'Musical conclusion of a movement or composition.'], \
    ['paladin', 'A heroic champion or paragon of chivalry.'], \
    ['syncopation', 'Shifting the emphasis of a beat to the normally weak beat.'], \
    ['albatross', 'A large bird of the ocean having a hooked beek and long, narrow wings.'], \
    ['harp', 'Musical instrument with 46 or more open strings played by plucking.'], \
    ['piston', 'A solid cylinder or disk that fits snugly in a larger cylinder and moves under pressure as in an engine.'], \
    ['caramel', 'A smooth chery candy made from suger, butter, cream or milk with flavoring.'], \
    ['coral', 'A rock-like deposit of organism skeletons that make up reefs.'], \
    ['dawn', 'The time of each morning at which daylight begins.'], \
    ['pitch', 'A resin derived from the sap of various pine trees.'], \
    ['fjord', 'A long, narrow, deep inlet of the sea between steep slopes.'], \
    ['lip', 'Either of two fleshy folds surrounding the mouth.'], \
    ['lime', 'The egg-shaped citrus fruit having a green coloring and acidic juice.'], \
    ['mist', 'A mass of fine water droplets in the air near or in contact with the ground.'], \
    ['plague', 'A widespread affliction or calamity.'], \
    ['yarn', 'A strand of twisted threads or a long elaborate narrative.'], \
    ['snicker', 'A snide, slightly stifled laugh.']
 
a = Crossword(13, 13, '-', 5000, word_list)
a.compute_crossword(2)
print a.word_bank()
print a.solution()
print a.word_find()
print a.display()
print a.legend()
print len(a.current_word_list), 'out of', len(word_list)
print a.debug
#end_full = float(time.time())
#print end_full - start_full

Sample output:

You should be able to see the associated methods lining up with the output. A side note: you must run the display() method before the legend() method can be ran.

mist
lime
snicker
paladin
caramel
leaven
pumpernickel
coral
fjord
plague
piston
lip
dawn
saffron
coda

p u m p e r n i c k e l -
a - - - - - - - a - - e -
l - s n i c k e r - - a -
a - a - - - - - a - - v -
d - f - c - - - m - - e -
i - f j o r d - e - - n -
n - r - d - - - l i p - -
- c o r a l - - - - i - -
- - n - - i - - - - s - -
- - - - - m i s t - t - -
p l a g u e - - - - o - -
- - - - - - - d a w n - -
- - - - - - - - - - - - -

p u m p e r n i c k e l v
a w j m p c a y a w r e s
l f s n i c k e r b z a x
a f a z k e u i a b f v k
d x f v c j f d m c n e x
i d f j o r d z e j g n z
n r r x d j a o l i p d j
i c o r a l u s t o i x w
m r n u e i i h o t s y w
m w e x s m i s t r t u j
p l a g u e b n h k o m s
f m n v j f p d a w n c q
m h j a e d p p r g t p j

1               4     8 -
  - - - - - - -   - -   -
  - 2             - -   -
  -   - - - - -   - -   -
  -   - 6 - - -   - -   -
  - 3         -   - -   -
  -   -   - - - 10   12 - -
- 5       9 - - - -   - -
- -   - -   - - - -   - -
- - - - - 11       -   - -
7           - - - -   - -
- - - - - - - 13       - -
- - - - - - - - - - - - -

1. (1,1) across: Dark, sour bread made from coarse ground rye.
1. (1,1) down: A heroic champion or paragon of chivalry.
2. (3,3) across: A snide, slightly stifled laugh.
2. (3,3) down: The dried, orange yellow plant used to as dye and as a cooking spice.
3. (3,6) across: A long, narrow, deep inlet of the sea between steep slopes.
4. (9,1) down: A smooth chery candy made from suger, butter, cream or milk with flavoring.
5. (2,8) across: A rock-like deposit of organism skeletons that make up reefs.
6. (5,5) down: Musical conclusion of a movement or composition.
7. (1,11) across: A widespread affliction or calamity.
8. (12,1) down: An agent, such as yeast, that cause batter or dough to rise..
9. (6,8) down: The egg-shaped citrus fruit having a green coloring and acidic juice.
10. (9,7) across: Either of two fleshy folds surrounding the mouth.
11. (6,10) across: A mass of fine water droplets in the air near or in contact with the ground.
12. (11,7) down: A solid cylinder or disk that fits snugly in a larger cylinder and moves under pressure as in an engine.
13. (8,12) across: The time of each morning at which daylight begins.

15 out of 20
811

Posted April 10, 2010 @ 4:18 pm under Boring Stuff, Work.