Python Crossword Puzzle Generator
As my next miniature project will be a crossword puzzle maker for teachers that will make random generation of crossword puzzles and word search puzzles, I thought I’d share the code I developed to create these puzzles on the fly. While I was working on it, I ran across many different scripts to accomplish this, but none of them were in my most favorite of languages: Python. Besides, I’d like the code to fit snugly in my web framework of choice: Django; the popular PHP version just wouldn’t cut it. Anyways, scroll down to see the code, or read on for a little primer about the process behind it.
Puzzles like these:
p u m p e r n i c k e l - p u m p e r n i c k e l v a - - - - - - - a - - e - a w j m p c a y a w r e s l - s n i c k e r - - a - l f s n i c k e r b z a x a - a - - - - - a - - v - a f a z k e u i a b f v k d - f - c - - - m - - e - d x f v c j f d m c n e x i - f j o r d - e - - n - i d f j o r d z e j g n z n - r - d - - - l i p - - n r r x d j a o l i p d j - c o r a l - - - - i - - i c o r a l u s t o i x w - - n - - i - - - - s - - m r n u e i i h o t s y w - - - - - m i s t - t - - m w e x s m i s t r t u j p l a g u e - - - - o - - p l a g u e b n h k o m s - - - - - - - d a w n - - f m n v j f p d a w n c q - - - - - - - - - - - - - m h j a e d p p r g t p j
Behind the Scenes
This program is actually very simple and creates completely random crosswords on the fly. Naturally, the more words you have, the better it will be at placing the most possible on a board. However, increasing the number of words will increase computation time. Additionally, increasing the board size will severely increase computation time. To counteract the fact that sometimes it will randomly generate a sub-par board, we will generate many different boards in an allotted time and only keep the “best” board (in this case, the board with the most words placed). So, as the board and word list gets bigger, the number of prospective boards created decreases within a fixed time.
The code first randomizes the word list and then sorts by word length. The idea here is that longer words are more difficult to place, so get them placed when the board is the most open. Next, we place the longest word on the 1, 1 coordinate of the grid as the seed. In tests, the placement of the first word at 1, 1 yielded by far the best results on average. Then we go to the next longest word and loop over its letters and each cell in the grid. When we find a match, we back it up and suggest a coordinate placement for that word. Once we’ve checked every letter against every cell, we chose the best (the word best is used very loosely here) coordinate and apply the word to the grid. Now we move on the next word and so forth. Once we’ve made it through once, we can loop over the unplaced words and looks for any lucky chances for a second placement.
This suggested coordinate system allows for a much faster fit than some methods I’ve seen that will randomly place a word to see if it works. Additionally, it requires the word cross other words which is the point of well, a crossword puzzle.
Operation
Be mindful when you create a word list to exclude words like “an” or “or” because these have a tendency to be placed inside other already placed words. This can be confusing. Simply run the code below.
You can feed the Crossword class a list of Word classes, or a list of tuples or lists with the word and clue. Either way works.
When you call the compute_crossword(seconds) method, it does all the work of computing the best crossword in however many seconds you passed. 1 second is probably enough for crossword grids of less that 20×20 and 2 seconds is fine for 25×25 and 3 seconds is good for 30×30. Additionally, if you have a massive word list, you may want to double the time alloyed. Finally, if you can’t run psycho, quadruple these times for similar quality.
The Code:
import random, re, time, string from copy import copy as duplicate # optional, speeds up by a factor of 4 import psyco psyco.full() class Crossword(object): def __init__(self, cols, rows, empty = '-', maxloops = 2000, available_words=[]): self.cols = cols self.rows = rows self.empty = empty self.maxloops = maxloops self.available_words = available_words self.randomize_word_list() self.current_word_list = [] self.debug = 0 self.clear_grid() def clear_grid(self): # initialize grid and fill with empty character self.grid = [] for i in range(self.rows): ea_row = [] for j in range(self.cols): ea_row.append(self.empty) self.grid.append(ea_row) def randomize_word_list(self): # also resets words and sorts by length temp_list = [] for word in self.available_words: if isinstance(word, Word): temp_list.append(Word(word.word, word.clue)) else: temp_list.append(Word(word[0], word[1])) random.shuffle(temp_list) # randomize word list temp_list.sort(key=lambda i: len(i.word), reverse=True) # sort by length self.available_words = temp_list def compute_crossword(self, time_permitted = 1.00, spins=2): time_permitted = float(time_permitted) count = 0 copy = Crossword(self.cols, self.rows, self.empty, self.maxloops, self.available_words) start_full = float(time.time()) while (float(time.time()) - start_full) < time_permitted or count == 0: # only run for x seconds self.debug += 1 copy.current_word_list = [] copy.clear_grid() copy.randomize_word_list() x = 0 while x < spins: # spins; 2 seems to be plenty for word in copy.available_words: if word not in copy.current_word_list: copy.fit_and_add(word) x += 1 #print copy.solution() #print len(copy.current_word_list), len(self.current_word_list), self.debug # buffer the best crossword by comparing placed words if len(copy.current_word_list) > len(self.current_word_list): self.current_word_list = copy.current_word_list self.grid = copy.grid count += 1 return def suggest_coord(self, word): count = 0 coordlist = [] glc = -1 for given_letter in word.word: # cycle through letters in word glc += 1 rowc = 0 for row in self.grid: # cycle through rows rowc += 1 colc = 0 for cell in row: # cycle through letters in rows colc += 1 if given_letter == cell: # check match letter in word to letters in row try: # suggest vertical placement if rowc - glc > 0: # make sure we're not suggesting a starting point off the grid if ((rowc - glc) + word.length) <= self.rows: # make sure word doesn't go off of grid coordlist.append([colc, rowc - glc, 1, colc + (rowc - glc), 0]) except: pass try: # suggest horizontal placement if colc - glc > 0: # make sure we're not suggesting a starting point off the grid if ((colc - glc) + word.length) <= self.cols: # make sure word doesn't go off of grid coordlist.append([colc - glc, rowc, 0, rowc + (colc - glc), 0]) except: pass # example: coordlist[0] = [col, row, vertical, col + row, score] #print word.word #print coordlist new_coordlist = self.sort_coordlist(coordlist, word) #print new_coordlist return new_coordlist def sort_coordlist(self, coordlist, word): # give each coordinate a score, then sort new_coordlist = [] for coord in coordlist: col, row, vertical = coord[0], coord[1], coord[2] coord[4] = self.check_fit_score(col, row, vertical, word) # checking scores if coord[4]: # 0 scores are filtered new_coordlist.append(coord) random.shuffle(new_coordlist) # randomize coord list; why not? new_coordlist.sort(key=lambda i: i[4], reverse=True) # put the best scores first return new_coordlist def fit_and_add(self, word): # doesn't really check fit except for the first word; otherwise just adds if score is good fit = False count = 0 coordlist = self.suggest_coord(word) while not fit and count < self.maxloops: if len(self.current_word_list) == 0: # this is the first word: the seed # top left seed of longest word yields best results (maybe override) vertical, col, row = random.randrange(0, 2), 1, 1 ''' # optional center seed method, slower and less keyword placement if vertical: col = int(round((self.cols + 1)/2, 0)) row = int(round((self.rows + 1)/2, 0)) - int(round((word.length + 1)/2, 0)) else: col = int(round((self.cols + 1)/2, 0)) - int(round((word.length + 1)/2, 0)) row = int(round((self.rows + 1)/2, 0)) # completely random seed method col = random.randrange(1, self.cols + 1) row = random.randrange(1, self.rows + 1) ''' if self.check_fit_score(col, row, vertical, word): fit = True self.set_word(col, row, vertical, word, force=True) else: # a subsquent words have scores calculated try: col, row, vertical = coordlist[count][0], coordlist[count][1], coordlist[count][2] except IndexError: return # no more cordinates, stop trying to fit if coordlist[count][4]: # already filtered these out, but double check fit = True self.set_word(col, row, vertical, word, force=True) count += 1 return def check_fit_score(self, col, row, vertical, word): ''' And return score (0 signifies no fit). 1 means a fit, 2+ means a cross. The more crosses the better. ''' if col < 1 or row < 1: return 0 count, score = 1, 1 # give score a standard value of 1, will override with 0 if collisions detected for letter in word.word: try: active_cell = self.get_cell(col, row) except IndexError: return 0 if active_cell == self.empty or active_cell == letter: pass else: return 0 if active_cell == letter: score += 1 if vertical: # check surroundings if active_cell != letter: # don't check surroundings if cross point if not self.check_if_cell_clear(col+1, row): # check right cell return 0 if not self.check_if_cell_clear(col-1, row): # check left cell return 0 if count == 1: # check top cell only on first letter if not self.check_if_cell_clear(col, row-1): return 0 if count == len(word.word): # check bottom cell only on last letter if not self.check_if_cell_clear(col, row+1): return 0 else: # else horizontal # check surroundings if active_cell != letter: # don't check surroundings if cross point if not self.check_if_cell_clear(col, row-1): # check top cell return 0 if not self.check_if_cell_clear(col, row+1): # check bottom cell return 0 if count == 1: # check left cell only on first letter if not self.check_if_cell_clear(col-1, row): return 0 if count == len(word.word): # check right cell only on last letter if not self.check_if_cell_clear(col+1, row): return 0 if vertical: # progress to next letter and position row += 1 else: # else horizontal col += 1 count += 1 return score def set_word(self, col, row, vertical, word, force=False): # also adds word to word list if force: word.col = col word.row = row word.vertical = vertical self.current_word_list.append(word) for letter in word.word: self.set_cell(col, row, letter) if vertical: row += 1 else: col += 1 return def set_cell(self, col, row, value): self.grid[row-1][col-1] = value def get_cell(self, col, row): return self.grid[row-1][col-1] def check_if_cell_clear(self, col, row): try: cell = self.get_cell(col, row) if cell == self.empty: return True except IndexError: pass return False def solution(self): # return solution grid outStr = "" for r in range(self.rows): for c in self.grid[r]: outStr += '%s ' % c outStr += '\n' return outStr def word_find(self): # return solution grid outStr = "" for r in range(self.rows): for c in self.grid[r]: if c == self.empty: outStr += '%s ' % string.lowercase[random.randint(0,len(string.lowercase)-1)] else: outStr += '%s ' % c outStr += '\n' return outStr def order_number_words(self): # orders words and applies numbering system to them self.current_word_list.sort(key=lambda i: (i.col + i.row)) count, icount = 1, 1 for word in self.current_word_list: word.number = count if icount < len(self.current_word_list): if word.col == self.current_word_list[icount].col and word.row == self.current_word_list[icount].row: pass else: count += 1 icount += 1 def display(self, order=True): # return (and order/number wordlist) the grid minus the words adding the numbers outStr = "" if order: self.order_number_words() copy = self for word in self.current_word_list: copy.set_cell(word.col, word.row, word.number) for r in range(copy.rows): for c in copy.grid[r]: outStr += '%s ' % c outStr += '\n' outStr = re.sub(r'[a-z]', ' ', outStr) return outStr def word_bank(self): outStr = '' temp_list = duplicate(self.current_word_list) random.shuffle(temp_list) # randomize word list for word in temp_list: outStr += '%s\n' % word.word return outStr def legend(self): # must order first outStr = '' for word in self.current_word_list: outStr += '%d. (%d,%d) %s: %s\n' % (word.number, word.col, word.row, word.down_across(), word.clue ) return outStr class Word(object): def __init__(self, word=None, clue=None): self.word = re.sub(r'\s', '', word.lower()) self.clue = clue self.length = len(self.word) # the below are set when placed on board self.row = None self.col = None self.vertical = None self.number = None def down_across(self): # return down or across if self.vertical: return 'down' else: return 'across' def __repr__(self): return self.word ### end class, start execution #start_full = float(time.time()) word_list = ['saffron', 'The dried, orange yellow plant used to as dye and as a cooking spice.'], \ ['pumpernickel', 'Dark, sour bread made from coarse ground rye.'], \ ['leaven', 'An agent, such as yeast, that cause batter or dough to rise..'], \ ['coda', 'Musical conclusion of a movement or composition.'], \ ['paladin', 'A heroic champion or paragon of chivalry.'], \ ['syncopation', 'Shifting the emphasis of a beat to the normally weak beat.'], \ ['albatross', 'A large bird of the ocean having a hooked beek and long, narrow wings.'], \ ['harp', 'Musical instrument with 46 or more open strings played by plucking.'], \ ['piston', 'A solid cylinder or disk that fits snugly in a larger cylinder and moves under pressure as in an engine.'], \ ['caramel', 'A smooth chery candy made from suger, butter, cream or milk with flavoring.'], \ ['coral', 'A rock-like deposit of organism skeletons that make up reefs.'], \ ['dawn', 'The time of each morning at which daylight begins.'], \ ['pitch', 'A resin derived from the sap of various pine trees.'], \ ['fjord', 'A long, narrow, deep inlet of the sea between steep slopes.'], \ ['lip', 'Either of two fleshy folds surrounding the mouth.'], \ ['lime', 'The egg-shaped citrus fruit having a green coloring and acidic juice.'], \ ['mist', 'A mass of fine water droplets in the air near or in contact with the ground.'], \ ['plague', 'A widespread affliction or calamity.'], \ ['yarn', 'A strand of twisted threads or a long elaborate narrative.'], \ ['snicker', 'A snide, slightly stifled laugh.'] a = Crossword(13, 13, '-', 5000, word_list) a.compute_crossword(2) print a.word_bank() print a.solution() print a.word_find() print a.display() print a.legend() print len(a.current_word_list), 'out of', len(word_list) print a.debug #end_full = float(time.time()) #print end_full - start_full
Sample output:
You should be able to see the associated methods lining up with the output. A side note: you must run the display() method before the legend() method can be ran.
mist lime snicker paladin caramel leaven pumpernickel coral fjord plague piston lip dawn saffron coda p u m p e r n i c k e l - a - - - - - - - a - - e - l - s n i c k e r - - a - a - a - - - - - a - - v - d - f - c - - - m - - e - i - f j o r d - e - - n - n - r - d - - - l i p - - - c o r a l - - - - i - - - - n - - i - - - - s - - - - - - - m i s t - t - - p l a g u e - - - - o - - - - - - - - - d a w n - - - - - - - - - - - - - - - p u m p e r n i c k e l v a w j m p c a y a w r e s l f s n i c k e r b z a x a f a z k e u i a b f v k d x f v c j f d m c n e x i d f j o r d z e j g n z n r r x d j a o l i p d j i c o r a l u s t o i x w m r n u e i i h o t s y w m w e x s m i s t r t u j p l a g u e b n h k o m s f m n v j f p d a w n c q m h j a e d p p r g t p j 1 4 8 - - - - - - - - - - - - 2 - - - - - - - - - - - - - - 6 - - - - - - - 3 - - - - - - - - - 10 12 - - - 5 9 - - - - - - - - - - - - - - - - - - - - - 11 - - - 7 - - - - - - - - - - - - - 13 - - - - - - - - - - - - - - - 1. (1,1) across: Dark, sour bread made from coarse ground rye. 1. (1,1) down: A heroic champion or paragon of chivalry. 2. (3,3) across: A snide, slightly stifled laugh. 2. (3,3) down: The dried, orange yellow plant used to as dye and as a cooking spice. 3. (3,6) across: A long, narrow, deep inlet of the sea between steep slopes. 4. (9,1) down: A smooth chery candy made from suger, butter, cream or milk with flavoring. 5. (2,8) across: A rock-like deposit of organism skeletons that make up reefs. 6. (5,5) down: Musical conclusion of a movement or composition. 7. (1,11) across: A widespread affliction or calamity. 8. (12,1) down: An agent, such as yeast, that cause batter or dough to rise.. 9. (6,8) down: The egg-shaped citrus fruit having a green coloring and acidic juice. 10. (9,7) across: Either of two fleshy folds surrounding the mouth. 11. (6,10) across: A mass of fine water droplets in the air near or in contact with the ground. 12. (11,7) down: A solid cylinder or disk that fits snugly in a larger cylinder and moves under pressure as in an engine. 13. (8,12) across: The time of each morning at which daylight begins. 15 out of 20 811
Bryan is a jazz and blues guitarist, small-time designer, Python hacker, entrepreneur, and lover of fine whiskeys. He's the man behind such sites as