Exploring the Right Way to Play Wordle

DataRes at UCLA
9 min readJun 17, 2022

Authors: Kevin Hamakawa (Project Lead), Arjun Pawar, Ena Zou, Olivia Wesiger, Amelie Ionescu

Time for Wordle

At exactly midnight, the New York Times Wordle resets and thousands take to their phones to attempt to guess a carefully selected five-letter word. This phenomenon has taken over the nation, from friends sending texts to each other across time zones to the Daily Bruin #wordle slack channel getting 500 slacks a week. But what exactly is Wordle and how does one play?

You start by inputting a random five letter word. Once the word is entered, the game gives the user hints: a letter not in the right place but in the word gets indicated with a yellow background and a letter in the right place turns green. A letter not in the word stays gray. With six tries per Wordle, over ten thousand five-letter words to choose from, and a couple of hints along the way, there are a large variety of tactics people can use to optimize their Wordle score (i.e. solve the word in the least amount of tries). In this article, we explore which ordering of words garner the statistically best possible outcomes for a game of Wordle.

Off to a Good Start!

With the introduction out of the way, let’s get into the important stuff, the words! Possibly the most defining moment in any Wordle game is a moment that comes before the game even starts — choosing your first word. After scrolling through Twitter threads and group chats around the world, you’ll find many rumors of what the “best” Wordle starter is. There are your standard “adieu” believers, enthusiastic “crane” users, and even avid “ouija” players. With all that being said, however, let’s see what the data has to say.

In order to come to a conclusion on what the best Wordle starter is, we wanted to classify what qualifies a word as a “good” word to use in Wordle. As mentioned above, every time a word is inputted into Wordle, the game returns a number of “correct” squares, denoted in green, a number of “correct but not in the right position” squares, denoted in yellow, and a number of “incorrect letter” squares, denoted in gray. Generally speaking, we can say a word is “good” by counting the number of non-gray squares there are.

For example, we can say “xylyl” (who even knew this was a word) is a bad word here:

While we can say “corks” is a good word here:

In these examples, “xylyl” has 0 greens, 0 yellows, and 5 grays, while “corks” has 1 green, 2 yellows, and only 2 grays. If we ramp this up to the next level, we can compare every possible 5 letter input with every possible Wordle word, in order to give each possible 5 letter input an average number of green, yellow, and gray squares for all of the possible Wordle inputs*.

* because the New York Times prohibits curse words and other obscure 5 letter words from being the Wordle of the day, the number of possible input words is actually different from the number of possible Wordle words

If we graph the words with the highest average number of total squares (green + yellow), we can arrive at a somewhat straightforward conclusion as to what the “best” Wordle starter may be.

Unfortunately, there are several flaws with this ranking system. Words like “roate”, “orate”, and “oater” all have the highest average number of squares — which makes sense considering they all have the same letter composition — but “roate” seems to be the objectively better word of the three, as it contains the highest average green squares. In order to counteract this, we created our own score for each word, in which we weighed the number of green squares heavier than the number of yellow squares, effectively creating a more balanced metric to measure each word against one another.

Graphing the best 10 words with our new scoring system, we get:

With this graph, we can get a much cleaner answer as to what the “best” Wordle starters are, with “soare” taking our prize for the best 5 letter word to start with in Wordle, followed by “roate” and “stare”.

What’s Next?

If we wanted to go to the next level and look at the best word to input after “soare”, we can rank the top 10 words that don’t contain any similar letters as “soare” in order to avoid repetition.

After looking at this ranking, we can find that there are many different routes that a player can go, depending on their preference. If you wanted to go purely off of score, then “clint” would be the obvious best choice here, being higher by a significant margin than the other words. However, if you wanted to fully eliminate all the vowels and start guessing consonants by word 3, then words such as “unlit”, “until”, and “cunit” start to look more useful. Overall, we’ve provided a good initial path for our readers to begin on, and it ultimately comes down to whatever path you want to take going forward.

Common Letter Positions

Out of the many words that are solutions to the Wordle, we also wanted to find the most common consonants occurring at each position. After all, starting each game with “soare” into “unlit” will always reduce the need to check for any vowels. In order to do so, we created frequency charts for each of the positions from 1 through 5 to see if any patterns emerge. We found that the letter S had an unusually high frequency in the beginning of a word (almost 350 such instances found in the solutions database). This can be considered representative of the fact that S is also the most occurring letter in position #1 of the words in the English dictionary!

In position #2, R and L were seen to be most frequent with R maintaining its popularity in position #3 as well. For position #4 particularly there was no letter that distinctly stood out relative to other letters but N and S were popular letters.

Finally, for the last letter in a word the letter Y seemed super-high on our frequency chart; this was followed by T.

Some noteworthy observations are that the letters L and R appeared in the “top-5 most frequent” for positions 2,3,4 and 5. So watch out for your L’s and R’s, as you might just find them somewhere in the middle of your word, when in doubt! In general, words beginning with ‘S’ and ending with ‘Y’ might be a good guess to maximize green tiles. Of course, one would try their best to guess “SRRNY” whenever in doubt, but unfortunately, that is not a valid English word accepted by the software. However, we can always find words slightly similar to SRRNY and use it as inspiration for fine-tuning our guesses!

Another way of representing the data analysis that we conducted is through the heat map below. This heat map shows the probability of a letter appearing at each position. For example, if S has a value of 0.7 at position 1, then it means that out of all the words that contain S, 70% have it at the beginning.

The striking observations here are that the boxes for Y, Q, and J are the darkest, implying that these letters have a very ‘niche’ or skewed probability in where they occur in words containing them. Y has an affinity for position #5 in a word, Q for for position #1, and J for position #1. This is a useful tip to our players because if we know that one of these letters exists in the solution (that is, if we get a yellow tile) then we know where to place them in the next guess to move towards more green tiles!

Similarly, some letters showed very low probability in certain positions and we can almost neglect these if we were to make word-guesses. U, V in position #5, Y in position #4, and Q in positions #4 and 5 are such examples. It is interesting to note that S (one of our “popular” letters) also saw a sharp decline in probability at position #2 indicating that it is not very common to see it in that place.

So feel free to keep our Wordle heat map as ready-reference with you the next time you decide to do your daily Wordle.

D-d-duplicates!

Another aspect of Wordle that players may want to utilize is the possibility of duplicate letters. In Wordle, duplicate letters are treated as independent characters. We’ve all been there, guessing “brode”, “crode”, “frode”, and even “qrode” before realizing that the word was “erode” all along. Duplicate letters are a factor in Wordle that many players seem to forget, so we’ve got you covered.

Based on our exploration of possible Wordle solution words, duplicates of the letters “E,” “O,” and “Z” are most common. The duplicate proportion is calculated in groups of letters from “A” to “Z”. The number of words with more than one of each respective letter are divided by the total number of words that contain that specific letter. The graph below displays the results of finding these proportions, with letters “J,” “Q,” and “X” excluded because there are no solutions with duplicates of these letters.

Knowing the frequencies of when certain words have duplicates is critical to your daily Wordle guesses. It is safe to say if the daily word contains one “J,” “Q,” or “X,” there is minimal to no chance that the word will contain another duplicate of those letters. Alternatively, if your word contains an “E,” “O,” or “Z,” there is a significant chance the word will contain a duplicate of those letters. Thus, it may be beneficial to enter words with two of that specific letter to increase your odds of successfully solving that day’s Wordle.

Tying it All Together!

So what would be the ideal way to play? It all depends on what your starting word gives you and what playing style you’re going for. We’ve discovered that our daily routine for Wordle will go like this:

  1. Input the word “soare”
  2. Choose the second/third words wisely (1) If you want to potentially guess in 3 words (“soare” gives a lot of yellows/greens): input “unlit” and guess from the hints you are given (2) If you want to guess in 4 words (“soare” does not give many matches): input “clint” into “pudgy” and guess from the hints you are given
  3. When given letters such as “q”, “j”, or “y”, keep in mind that they are most likely to be in specific positions (refer back to heatmap)
  4. Never forget that duplicates are a possibility, especially with “e”, “o”, and “z”.

Of course, there’s no one right way to play Wordle, but if you ever want a statistical advantage over your friends, don’t forget about this article!

Github: https://github.com/datares/S22-Team-WORDS

--

--