Initially I thought this week’s Classic puzzle might be too tough to solve without some serious insights, but the more I thought about it, the more it seemed like I might be able to break it down to manageable pieces.
The challenge is a kind of reversal of Spelling Bee. Instead of just finding all of the words that can be formed (with certain constraints) from seven letters, we have to find the seven letters that would score give the highest score. The word list is provided.
The basic plan is simple, just not easy! We generate all the possible seven-letter combinations, go through each of them seven times (once for every possible central letter), then go over every word in the word list, and if it can be made out of those letters, then we note the score for that work. Then the winning combination is the one with the highest total score.
The problem with this approach, of course, is that it will take far too long. There are 26C7 combinations of seven different letters, that’s 657 800, and each combination needs to be considered with all seven different letters as the centre letter. That takes us to 4·6 million different honeycombs to calculate the score for. The word list contains 172 820 words, so that gives us 795 766 972 000, or about 800 billion, comparisons to make.
I don’t know how long that would take to run, on my laptop, but I’m pretty sure I don’t want to try. If my laptop could manage one million comparisons per second, which seems unlikely, then it would take about nine days to complete.
Fortunately there are lots of simplifications we can make. Looking at the list of words:
- Only words with at least four letters can score points, so we can ignore anything shorter than that.
- We can’t have an ‘s’ in the honeycomb, so we can’t have an ‘s’ in any of the scoring words.
- If a word has more than seven distinct letters, it won’t match any possible honeycomb, since they only have seven letters. Those words can be ignored too.
If we filter out those words that are too short, contain an ‘s’, or have too many distinct letters, we shrink the word list to just 44 585 words, about a quarter of its original size.
Next we can try to shrink the number of honeycombs we have to test. Theoretically there are 658 k possibilities, but actually we don’t care about most of them. Notice that there are only 45 k words in our filtered word list; even if every one of those words had a unique honeycomb, there would still only be 45 k honeycombs.
So instead of generating all the theoretically possible sets of seven letters, we can take only the sets that occur in real words. We also need to add in all of the shorter sets of letters that aren’t subsets of the seven-letter sets we find. That gives us about 9 k sets of letters that we need to find scores for.
The last thing I could see that seemed worth doing was to pre-score the different letter combinations. We don’t really care about individual words, just the letter sets that they correspond to. So instead of considering ‘inner’, ‘rein’, ‘renin’, and ‘rennin’ separately, we can lump them all together under their letter set ‘e’, ‘i’, ‘n’, ‘r’, with a combined score of 17. That takes us from looking at 45 k words to looking at only 22 k letter sets. Interestingly, more than half of the letter sets contain only one word.
I wrote the code in Python, and it’s in github. On my 18-month old laptop, the code runs in about three minutes. Much better than the nine or so days it might have taken without simplification! If anyone has suggestions on changes that would make the code cleaner or more Pythonic, I’d love to hear them, as I’m still just easing myself into the language.
And if you’re super-curious, I found that the best central letter was ‘r’, with the six other letters being ‘a, ‘e’, ‘g’, ‘i’, ‘n’, and ‘t’. The total score for that honeycomb is 3 898.