Snakes and bare hands AI

Artificial intelligence (AI for short) seems to be everywhere and to be the solution to every problem. Every now and then, I still do software development and I wanted to try my skills at some AI development. This comes in three ways:

Integrate AI-based products in some software, just like asking ChatGPT from a program instead of from its human user interface. I had done this before and you can read more about this here.
Build software using a software library offering AI features, which is like building a house from bricks, windows, doors and appliances provided by somebody else, bringing them together to solve your problem but not creating them.
Start from scratch and build everything yourself, from "neurons" your "neural networks" to the code running them, learning, evolving and trying to solve the problem at hand.

I chose the third path, starting from nothing but a programming language and some theoretical knowledge, plus a video from Amy Plant, a french YouTuber which basically got me intrigued and started on this thing. I've heard of her via the french technology podcast called Le code a changé, but I digress again, it is just showing how things are connected. What frustrated me a bit with her video is the lack of explanation how a snake's "brain" is made. More on this later.

So here we go, let me tame snakes with bare hands AI techniques for your entertainment...

AI Snake

The problem at hand is easy: create an AI that play the "snake" game. Guide a snake that moves on a grid, going up, down, left or right, trying to get to the square where there is food. When the snake gets food, it grows by one unit. If the snake's head hits a wall or any part of its body, the game is over. If you don't know the snake game, check its wikipedia page.

So the plan is simple:

Create the game itself, being manually playable, to ensure it works fine
Create an AI able to play it
Train the AI until it becomes unbeatable
Have a party

Part 1 is easy, so I won't write too much about it here. Parts 2 and 3, this is where the problem is. Spoiler alert, part 4 is delayed...

Step 2 - Create an AI to play snake

With a basic understanding level of AI, it is clear that you need a "neural network" with inputs and outputs and some magic in the middle. Let us go backwards here. A neural network a set of neurons organised in layers, with the output of each layer being the input of the next, until the output. Each neuron gets multiple inputs and use them to calculate its output. More on this later.

The output layer

The network shall decide in which direction the snake must go next and the easiest way to do that is by a final layer with four neurons, one per direction, calculate the values of the network and select the output neuron with the highest value. Voilà. This was one of the first "AI epiphanies" for me with this project. The network has zero concept of direction, there could be five, three or twenty six different possible outputs, the only difference would be the number of output neurons.

The neurons

Now, let us have a look at the neurons themselves. Basic AI knowledge tells that a neuron is a function having multiple inputs, some parameters and one output. But what for a function? Not having great inspiration here, I went for something very simple:

Inputs: the outputs of all neurons in the previous layers, which is a numerical value
Parameters: a weight and a bias for each input
Output: the sum of each input multiplied by its weight plus its bias

In pseudo code:

output = 0; for each input with index i: output = output + weight[i] * inputs[i] + biases[i] return output

The network

So going backwards from our output layer, we must now create the so-called "hidden" layers. Those layers are called "hidden" because they are not interacting with the outside world and are hidden between the input layer and the output layer.

By lack of better ideas, I decided to use dense layers, meaning that each neuron is connected to all the outputs of the previous layer. If the last hidden layer has 8 neurons, then the four neurons of the output layer each have eight inputs, and therefore eight weights and eight biases. Calculating the output of one such neuron requires the to summing the sum of a product (input * weight) and a bias (input * weight + bias) eight times. This also means that the output layer of four neurons has a total of eight parameters: four weights and four biases.

Neural networks can be made of any number of layers and each layers can have any size. So here comes the next question when creating it: how many neurons are required to play snake and how should they be organised? One large layer? Multiple smaller layers? Many large layers?

What do programmers do when they don't know? They make it adjustable with parameters. We will experiment with this later.

The inputs

Now, we have a network of neurons, connected all the way to the output. But how to connect them to the game situation? The network must know what the game situation is to make its decision, right? This is where the situation became complicated for me. What should the inputs be?

All of the grid? Just throw all of it into the network without further "explanation" and let the network deal with it? With a grid size of 20 x 20, that would mean a vector of 400 values, each being a wall, food, the snake's head, the snake's body or nothing. This seems a bit brutal and expensive, but a potential try for later.
It seems pretty obvious that the position of the food must be part of it. I decided to put the distance to the food in the X and Y axis. Writing about it now, I'm not sure if this was the best decision, maybe just having the position of the food would be better. One more time, writing is good for thinking...
The snake's head's position should also be helpful to decide in which direction to go next.
Finally, the snake's body's position could be helpful but how to code it as a number? I decided to use four values instead, namely the distance until the next obstacle (wall or body part) in each direction.

Picking the right input parameters is probably the most important decision, with the shape and size of the network second. At this stage, the input values I use are the following:

Distance to the next obstacle towards up, down, left and right (4 values)
Vertical and horizontal distance to food (2 values)
Snake's head's position vertically and horizontally (2 values)
Snake's direction of movement, vertically and horizontally (2 values)

The direction of movement seems important because a snake shall never change to a directly opposite direction (from up to down, down to up, left to right or right to left), as this would lead to immediate death.

Step 2 - Done!

And with this, step 2 is done. Our neural network gets the inputs from the game's current situation, calculates all neurons and then provides a result, telling us in which direction the snake must go next. Put this into the game engine and calculate if the snake moves, grows, hits a wall or bites its tail. Easy... off to step 3!

Step 3 - Train the AI

One option for training is to rely on luck. Yes, really, just plain luck. After reading around a bit, I decided to go for a technique called "genetic algorithm". Just like for neural network, genetic algorithms are inspired by biology. The idea is pretty simple:

Create a large number of individuals tasked with solving the same problem, each having a random "DNA"
Evaluate how good they are
Keep the ones with the best results and create more by "crossing" the DNA's of the fittest ones

Creating a random population

On the basis of what we've done before, this is easy. Each individual is a neural network with purely random weights and biases. No idea what this will result in but statistically speaking, at least a few of them must have a few interesting capabilities, like avoiding hitting walls (because they know the distance to danger in each direction...)

Evaluate how good each individual is

This part is super, super easy... Initiate a new game of snake, feed it to the network using the input values, get the next move from the output and repeat until the snake dies. The final snake's length gives a score.

One little trick: it is necessary to put a limitation to the number of moves each individual can do, otherwise agents moving in an endless loop will result in exactly that, and endless loop. They will never get food and also never hit a wall. If such an agent emerges, its score must forced to zero, also if it by luck hits food.

One question here is the number of individuals to use. As it is all based on luck and hope that at least some have interesting capabilities, the largest the population the better. But how large is large enough? 100? 1000? 10000? Here again, I set this as a parameter, typical programmer reaction ;-)

With a grid of 20x20, the quality of one snake depends on where the food is placed randomly. If it is far away and ahead, it is easier to navigate there than if it is placed directly behind the head. To make sure that the evaluation is not too biased by good or bad luck in food placement, I decided to give each snake multiple runs and to evaluate its score as the average of all runs. This is costly in terms of runtime but the hope behind is that it helps selecting fitter individuals.

Create the next population

For the generation of the next population, I decided to keep a proportion of the best individuals and send them to the next generation directly. This guarantees that their "genes" are not fully lost, in case the crossings are not generating better individuals.

I then cross the best individual with the second best, to generate two new individuals, then the second with the third, and so on. The crossing itself is made layer by layer, neuron by neuron, input by input. For each input, if a random number between 0 and 1 is bigger than 0.5, the new weight and bias goes in one of the two children and if not, it goes into the second child.

Is this the best way to create the next generation? This is one open question at this time. May be it would be better to not do it on a 50 / 50 basis but it could also be better to do it with full neurons instead of doing it input by input. I have no idea which method would be best and I will experiment with this later.

Genetic algorithms normally also include a change of mutation, randomly changing a parameter. I did not include this yet, but I also complete the population with new fully random individuals at each round, thus bringing new genetic material. But here again, writing about it makes me think again, may be the new random individuals are so bad compared to their "evolved" companions that their chance of mixing their genes are too low and that the algorithm just breeds without enough innovation...

The results... so far

Many parameters are still open and the results are not definitive, there could be more posts on this but for now:

The trained networks are pretty good at finding the food
They are mostly ok at avoiding walls
They are almost unable to avoid the snake's own body.

Point 3 is pretty frustrating. I still see a lot of cases where the snake just tries to go in the opposite direction when the food is placed straight behind it. It feels like the network is not able to learn how to take a U-turn in two steps. I played with various parameters: the number of neurons in the hidden layer, the number of hidden layers, various input parameters, number of individuals, number of generations...

The best snakes I get can eat between 6 and 10 times. Interestingly, or sadly, the best ones are trained using only the position of food as input parameters. My intuition on this is that it makes food "attractive" and it then goes towards it and with a grid of 20x20 and a bit of luck in random placement of food, it is just luck that it does not bite itself. Such snakes don't hit the walls, because they are drawn towards the food, which by nature is always within the grid.

Snakes trained with the head's position and distance to danger are not performing as well, and at this time I have just no idea why. Many hypothesis are open:

Maybe the networks are too small (not enough neurons) or too shallow (not enough layers)
Maybe the evolution had no time to fully kick-in: not enough individuals or not enough generations
Maybe the cross-over technique is not at the right level (neuron vs. input) or not using the best possible probability level (20 / 80 vs. 50 / 50)
Maybe not having mutations in the cross-over process limits innovation too much
Maybe the input parameters are not the right ones to solve the problem

With so many questions open, it is now time for some instrumentation, to know how each of the parameters and hypothesis influences the results. So time to do what modern programmers do... head to google and stack overflow, for ideas and inspiration.

Step 4 is not forgotten, it is just postponed...

I hope you liked this post and if you want to discuss it further or have ideas, don't hesitate to get in touch on Linkedin and let us talk snakes!

Snakes and bare hands AI - Part 1?