Wordle is an interesting 5 letter word puzzle game. To learn how to play it just watch the first video link below.
To cheat thru it with regex, first, get the allowed_words file. Then grep thru it.
First, get the allowed_words file.
Youtuber 3b1b has created a python script analyzing the best words to start with.
In his first iteration the best-started word for the game is “crane”, but his second video updates that to another word, “salet”.
First video: https://www.youtube.com/watch?v=v68zYyaEmEA
Second video: https://www.youtube.com/watch?v=fRed0Xmc2Wg
Prepare
Anyhow, download his latest wordle git to get the allowed words file
git clone https://github.com/3b1b/videos cd videos/_2022/wordle/data cat allowed_words.txt
Or just download the allowed_words file like this
wget https://raw.githubusercontent.com/3b1b/videos/master/_2022/wordle/data/allowed_words.txt # or curl -o allowed_words.txt https://raw.githubusercontent.com/3b1b/videos/master/_2022/wordle/data/allowed_words.txt
Note: there is a file called possible_words that has all of the English 5 letter words. When wordle was created, they didn’t use all of the words. Fun fact, the creator of the game had his girlfriend or wife select which words.
Example
So let us get started. I can show you how to do everything you need to learn with one step of the game / one example.
Let us try the word “crane”. And we end up getting:
Green, Grey, Grey, Yellow, Green
We know that green means it’s the right letter and in the right spot.
Yellow means that a letter exists in the word but it is in the wrong spot.
Grey means that letter doesn’t exist in the word.
So let us take care of the three cases: green, yellow, and grey.
case 1. We then get green on “c”, and “e”. So we know there are a “c” and “e”, and they are in the right spot.
case 2. We get yellow on “n”. So we know there is an “n”, but it is not in the 4th slot.
case 3. We get greys on “r” and “a”. So we know there is no “r” or “a”.
Note you will see that there are many ways to achieve the same result with grep.
Sidenote: grep is not the only alternative for this. Awk can be used too. Sed as well. For simplicity, we will use standard grep.
case 1:
We know that a “c” and an “e” are in the first and last spot respectively.
Dot character means it can be any character. We also know that those 3 dots, can’t be certain letters, and we will take care of that with case 2 and case 3, and when we pipe the greps together, we will get our desired end result.
cat allowed_words.txt | grep "c...e"
case 2:
We know that we have an “n”, but it cant be in the fourth slot.
cat allowed_words.txt | grep -v "...n."
or
cat allowed_words.txt | grep "...[^n]."
But we are not done, that only takes care of an “n” not being the 4th slot. However, now we must make sure there is an “n” in there. To do that we pipe another grep
cat allowed_words.txt | grep -v "...[n]." | grep n
or
cat allowed_words.txt | grep "...[^n]." | grep n
Now its looking for an “n”, that is not in the 4th slot.
case 3:
We know that we can’t have an “r” or an “a” anywhere. So we can take care of that with this grep
cat allowed_words.txt | grep -v "a" | grep -v "r"
or combine them like this (note the order of a or r doesn’t matter)
cat allowed_words.txt | grep -v "[ar]"
combining the cases:
To combine the cases we can pipe them together:
cat allowed_words.txt | grep "c...e" | grep "...[^n]." | grep n | grep -v "[ar]"
The first and second grep can actually be meshed together, giving an end result like this:
cat allowed_words.txt | grep "c..[^n]e" | grep n | grep -v "[ar]"
Here we can see the output of both commands is the same:
> cat allowed_words.txt | grep "c..[^n]e" | grep n | grep -v "[ar]"
cense conge conte
> cat allowed_words.txt | grep "c...e" | grep "...[^n]." | grep n | grep -v "[ar]"
cense conge conte
So next we know we need to try cense, conge or conte.
How do we apply multiple steps?
From above’s set of given steps and their results, I would construct the following:
> cat WORDLE_ALLOWED.txt | grep '[^e][^et][^h]i[^e]' | grep e | grep t | grep h | grep -v "[wavolmc]"
The 1st grep takes care of the yellows (its the right letter but its in the wrong spot). Also, some spots may have multiple letters that cannot be there: The 2nd spot, as an example, can’t be an “e” and can’t be “t”.
Also the 1st grep takes care of the green, with the i.
The 2nd, 3rd and 4th grep repeats the letters from the 1st grep. Specifically, the ones which we don’t know where they lay. We could include the letter we know, “i”, but it wouldn’t change the answer. We know that there should be an “e”, “t” and “h” somewhere. Note using “grep [eth]” is wrong as it will just match an “e” or “t” or “h”, but we need it to match an “e” and a “t” and an “h”.
The 3rd grep takes care of the letters that can’t be there, all of the grey letters. Note that we didn’t include the grey e. This is because WEAVE had 2 es, and the first e was yellow. So that means there is only 1 e in the thing. Some words might have 2 es or more. In that case, more would be yellow or green
grep ‘[eth]’ is incorrect. Should be grep e|grep t|grep h
You are correct, I made that discovery as well and forgot to update my article.