Artificial intelligence (AI for short) seems to be everywhere and to be the solution to every problem. Every now and then, I still do software development and I wanted to try my skills at some AI development. This comes in three ways:
I chose the third path, starting from nothing but a programming language and some theoretical knowledge, plus a video from Amy Plant, a french YouTuber which basically got me intrigued and started on this thing. I've heard of her via the french technology podcast called Le code a changé, but I digress again, it is just showing how things are connected. What frustrated me a bit with her video is the lack of explanation how a snake's "brain" is made. More on this later.
So here we go, let me tame snakes with bare hands AI techniques for your entertainment...
The problem at hand is easy: create an AI that play the "snake" game. Guide a snake that moves on a grid, going up, down, left or right, trying to get to the square where there is food. When the snake gets food, it grows by one unit. If the snake's head hits a wall or any part of its body, the game is over. If you don't know the snake game, check its wikipedia page.
So the plan is simple:
Part 1 is easy, so I won't write too much about it here. Parts 2 and 3, this is where the problem is. Spoiler alert, part 4 is delayed...
With a basic understanding level of AI, it is clear that you need a "neural network" with inputs and outputs and some magic in the middle. Let us go backwards here. A neural network a set of neurons organised in layers, with the output of each layer being the input of the next, until the output. Each neuron gets multiple inputs and use them to calculate its output. More on this later.
The network shall decide in which direction the snake must go next and the easiest way to do that is by a final layer with four neurons, one per direction, calculate the values of the network and select the output neuron with the highest value. VoilĂ . This was one of the first "AI epiphanies" for me with this project. The network has zero concept of direction, there could be five, three or twenty six different possible outputs, the only difference would be the number of output neurons.
Now, let us have a look at the neurons themselves. Basic AI knowledge tells that a neuron is a function having multiple inputs, some parameters and one output. But what for a function? Not having great inspiration here, I went for something very simple:
In pseudo code:
output = 0;
for each input with index i:
output = output + weight[i] * inputs[i] + biases[i]
return output
So going backwards from our output layer, we must now create the so-called "hidden" layers. Those layers are called "hidden" because they are not interacting with the outside world and are hidden between the input layer and the output layer.
By lack of better ideas, I decided to use dense layers, meaning that each neuron is connected to all the outputs of the previous layer. If the last hidden layer has 8 neurons, then the four neurons of the output layer each have eight inputs, and therefore eight weights and eight biases. Calculating the output of one such neuron requires the to summing the sum of a product (input * weight) and a bias (input * weight + bias) eight times. This also means that the output layer of four neurons has a total of eight parameters: four weights and four biases.
Neural networks can be made of any number of layers and each layers can have any size. So here comes the next question when creating it: how many neurons are required to play snake and how should they be organised? One large layer? Multiple smaller layers? Many large layers?
What do programmers do when they don't know? They make it adjustable with parameters. We will experiment with this later.
Now, we have a network of neurons, connected all the way to the output. But how to connect them to the game situation? The network must know what the game situation is to make its decision, right? This is where the situation became complicated for me. What should the inputs be?
Picking the right input parameters is probably the most important decision, with the shape and size of the network second. At this stage, the input values I use are the following:
The direction of movement seems important because a snake shall never change to a directly opposite direction (from up to down, down to up, left to right or right to left), as this would lead to immediate death.
And with this, step 2 is done. Our neural network gets the inputs from the game's current situation, calculates all neurons and then provides a result, telling us in which direction the snake must go next. Put this into the game engine and calculate if the snake moves, grows, hits a wall or bites its tail. Easy... off to step 3!
One option for training is to rely on luck. Yes, really, just plain luck. After reading around a bit, I decided to go for a technique called "genetic algorithm". Just like for neural network, genetic algorithms are inspired by biology. The idea is pretty simple:
On the basis of what we've done before, this is easy. Each individual is a neural network with purely random weights and biases. No idea what this will result in but statistically speaking, at least a few of them must have a few interesting capabilities, like avoiding hitting walls (because they know the distance to danger in each direction...)
This part is super, super easy... Initiate a new game of snake, feed it to the network using the input values, get the next move from the output and repeat until the snake dies. The final snake's length gives a score.
One little trick: it is necessary to put a limitation to the number of moves each individual can do, otherwise agents moving in an endless loop will result in exactly that, and endless loop. They will never get food and also never hit a wall. If such an agent emerges, its score must forced to zero, also if it by luck hits food.
One question here is the number of individuals to use. As it is all based on luck and hope that at least some have interesting capabilities, the largest the population the better. But how large is large enough? 100? 1000? 10000? Here again, I set this as a parameter, typical programmer reaction ;-)
With a grid of 20x20, the quality of one snake depends on where the food is placed randomly. If it is far away and ahead, it is easier to navigate there than if it is placed directly behind the head. To make sure that the evaluation is not too biased by good or bad luck in food placement, I decided to give each snake multiple runs and to evaluate its score as the average of all runs. This is costly in terms of runtime but the hope behind is that it helps selecting fitter individuals.
For the generation of the next population, I decided to keep a proportion of the best individuals and send them to the next generation directly. This guarantees that their "genes" are not fully lost, in case the crossings are not generating better individuals.
I then cross the best individual with the second best, to generate two new individuals, then the second with the third, and so on. The crossing itself is made layer by layer, neuron by neuron, input by input. For each input, if a random number between 0 and 1 is bigger than 0.5, the new weight and bias goes in one of the two children and if not, it goes into the second child.
Is this the best way to create the next generation? This is one open question at this time. May be it would be better to not do it on a 50 / 50 basis but it could also be better to do it with full neurons instead of doing it input by input. I have no idea which method would be best and I will experiment with this later.
Genetic algorithms normally also include a change of mutation, randomly changing a parameter. I did not include this yet, but I also complete the population with new fully random individuals at each round, thus bringing new genetic material. But here again, writing about it makes me think again, may be the new random individuals are so bad compared to their "evolved" companions that their chance of mixing their genes are too low and that the algorithm just breeds without enough innovation...
Many parameters are still open and the results are not definitive, there could be more posts on this but for now:
Point 3 is pretty frustrating. I still see a lot of cases where the snake just tries to go in the opposite direction when the food is placed straight behind it. It feels like the network is not able to learn how to take a U-turn in two steps. I played with various parameters: the number of neurons in the hidden layer, the number of hidden layers, various input parameters, number of individuals, number of generations...
The best snakes I get can eat between 6 and 10 times. Interestingly, or sadly, the best ones are trained using only the position of food as input parameters. My intuition on this is that it makes food "attractive" and it then goes towards it and with a grid of 20x20 and a bit of luck in random placement of food, it is just luck that it does not bite itself. Such snakes don't hit the walls, because they are drawn towards the food, which by nature is always within the grid.
Snakes trained with the head's position and distance to danger are not performing as well, and at this time I have just no idea why. Many hypothesis are open:
With so many questions open, it is now time for some instrumentation, to know how each of the parameters and hypothesis influences the results. So time to do what modern programmers do... head to google and stack overflow, for ideas and inspiration.
Step 4 is not forgotten, it is just postponed...
I hope you liked this post and if you want to discuss it further or have ideas, don't hesitate to get in touch on Linkedin and let us talk snakes!