As a result, I get occasional questions about how I wrote the word ladder game. Usually, questions about how I got the game/solver to run so fast.
This is a rough explanation of how the game is designed...
1. Finding a list of words: First I looked online, and downloaded some free word lists (I think I used http://www.puzzlers.org or http://wordlist.sourceforge.net/). While I thought this would be the simplest part of the problem, it actually turned out to be the hardest. I ran into a problem of finding a good word list: one that had common words, and a lot of them.
Obscure words, archaic words, typos, or abbreviations make no sense in a word ladder. Eventually I was forced to combine a few smaller list that had more common words. If the solver has flaws, it is mostly because I've never found a good balance between the quality and quantity of words. It's a fuzzy problem with a lot of grey area... and I've never wanted to manually edit a list of 100k's of words.
2. The idea of a word ladder solver is simple, the trick is getting it to be fast. If thought isn't put into the basic algorithm, it will probably be incredibly slow. Certainly, downloading several megabytes of dictionary data will be very slow. Searching through this will be even slower.
The two key insights here are: 90% of the words in a dictionary probably will not occur in any word ladder, and that most all the time searching can be shortened by *precomputing* as much as possible in advance.
I wrote a small perl script that broke the huge word lists into groups by word length (3 letter words, 4 letter words, etc) then linked each word to all possible word ladder kin. The flat file looked like this:
3688 abaca 2 1 2 abaci 2 0 2 aback 3 0 1 50 abase 3 4 5 11 abash 2 3 166 abate 2 3 38 abend 1 77 abets 1 12 abide 2 9 130 abode 3 8 10 89 above 1 9 abuse 2 3 82 abuts 1 7 abysm 1 14 ...
This is a pretty simple format (which probably could be optimized a lot). The first line is the number of words in the file. For the next lines, the word is the first field, followed by the number of kin, and then row numbers of other words that are only different by one letter (line numbers). Also for a bit more speed: I zipped the word lists (split by word size) and only download them as needed. This reduced the dictionary size probably by 99% and also almost completely solved the problem.
3. Then the solver is trivial... given two words, I start from the first and build (or crawl) a tree of all possible word ladders starting from that node. This branches out until all possibilities are found , or until a branch contains the end word (a solution). In this case, I just bubble back up the tree. On thing here that might be tricky is how the tree is built...
note: a recursive function would be much simpler
but would create a tree with flagrant left-bias.
resulting ladders would not be as short as possible
(words can only be used once)
level is a horizontal slice of tree:
level 0 |
level 1 /\ \
level 2 /\ | \
4. Now that the solver is built, it's possible to write the game... Note that there is usually more than one possible solution for a puzzle, so this needs to be checked dynamically.
For more information: download the source code (GPL applies)
compile and package with:
javac *.java
jar cvf WordLadders.jar *.class *.au wordlist/
It would be easy to take this game and adapt it to different kinds of "connect the words" type games. For instance, I had thought about making a conceptual version of the game (where words are linked by synonyms of possible meanings). For example: shut, close, near.