Snakes and bare hands AI - Part deux

This post is the second in what is now a series, and you can read the first one about the inception of the "snake with bare hands AI" project and the first results here. Long story short, I decided to create an AI to play the old snake game, without using any AI framework. Last time, I ended-up with a snake doing a great work at avoiding walls but not really able to avoid its own tail.

One week later, there is progress but also a few lessons. So take a seat and let me tell you more about my adventures in snake land... No snake was hurt in the doing of this.

DALL·E 2024-08-03 19.59.17 - A person training snakes with bare hands and using AI technology. The scene is set in a high-tech facility with digital screens displaying the text b

First thing first, I have to admit one of the lesson is about the choice of programming language and environment. Over my life, I've been fluent in many programming languages (C, C++, Ada, Pascal, PHP, Objective-C, swift, ...) and at the moment, my languages of choice are Javascript and Typescript. Because this was supposed to be "just a simple one-week-end" project, I went for vanilla Javascript, even leaving the advantages of Typescript aside. Who needs strong typing and the extra safety for such a little thing, right? Nope... Picking Javascript over Typescript was a quick shortcut and as always, it did bite back and lost me time. At this time, the project is still in Javascript, because I consider switching to Python, which could happen next, but my Python is really, really basic, so I'm not sure if this is a good idea. But if I do switch language, I will still do it without using any powerful AI framework, as one of the goals is to learn more about AI itself.

The progress came from different changes that I'll address in this post:

  • Using better inputs
  • Initial snake's length
  • Increasing the size of the population for the genetic algorithm

I also made multiple changes on the way to getting there but they had little to no impact:

  • Activation functions
  • Number of neurons in the network, both per layer and number of layers
  • Normalisation

I need to do more experiments on those three factors, so it is too early to write too much about it - this would be for part three...

Finally, one problem that exists and remains, which I did not understand well at the beginning is the evaluation of how good a given snake is, which is not as easy as it seems.

Better inputs to the neural network

The factor which did produce the largest improvement was adjusting the inputs. So far, the input values I selected were the following:

  • Distance to the next obstacle towards up, down, left and right (4 values)
  • Vertical and horizontal distance to food (2 values)
  • Snake's head's position vertically and horizontally (2 values)
  • Snake's direction of movement, vertically and horizontally (2 values)

At some stage, I realised that my snakes kept biting their own tails and were pretty good at avoiding the walls. After checking, I noted that the distance to the next obstacle inputs were almost useless. Setting them to zero at all times did not really made a change. Surprise, surprise... with so many possible parameters and also all the number of parameters besides the neural network itself, it is hard to know what counts and what does not.

What helped me understand was how the snake was sometimes going along its own tail, getting closer to the food and then biting its tail when it was getting close enough. It felt like the food became more attractive than the tail was repulsive. Multiplying the distance to danger by the distance to food as an input really, really improved the results.

One interesting question here - and I don't have a definitive response for this yet - is why can't the neural network learn it by itself? If distance to food multiplied by distance to danger is a good input, then a random neural network in which this happens would have a massive advantage and should reproduce itself through generations. It is clear that every neuron's output is the sum of its inputs multiplied by their respective weights. An input parameter never gets multiplied by another one, however you configure them. Even with more than one hidden layer, it basically can't happen. Would another type of neuron help? One more thing to experiment with.

Initial snake's length

Another factor that helped and accelerated the learning was to start with longer snakes. The previous version started with just the head. The risk of collision with the body, and therefore the sheer possibility to learn that this is not a good thing can't happen, because unless if the snake makes a 180° turn (i.e. going down while it was going up), a body length of a least three (plus head) is required for a collision.

Creating an initial snake with three body pieces and the head did lead to a much better chance of growing longer over time and therefore becoming better. I'm not saying that starting with shorter snakes is a no-go but it makes the chances of good initial learning smaller and therefore need more training.

Increasing the population for genetic learning

Remember that the learning loop is based on a genetic algorithm. A lot of neural networks are generated, which all play the game and then the better ones are selected and randomly crossed with others to generate the next generation. I also keep the best ones in the next generation, in case the crossing does not result in better ones. The initial capabilities are fully random.

Such an algorithm only works with a large enough population growing over a large enough number of generations. But how large is large enough and how many generations are enough?

I first started with 100 individuals for 100 generations. To monitor the progress, I checked the score of the best individual. I quickly noted that the score tends to increase rapidly in the beginning and then plateau. However, depending on the number of individuals, there is also an initial plateau, until the first improvements happen. In some cases, I've even seen that two consecutive runs with the same parameters did lead to very different results, with some capabilities being developed in some runs and not in others. This basically mean that if the population is too small, interesting initial capabilities are not present at all.

From my experience so far, for this problem, I get better results with a population of 1000 snakes over 250 generations. Obviously, this brings the question of computing power. Ideally, one could take five gazillions snakes and run bazillions of generations, but this is not possible. More snakes require more memory, more execution times and more generations also require more time.

Interestingly, increasing the number of snakes beyond ca. 5000 does not bring much help. They reach maturity faster but it does not push their capabilities further. So far, the same applies to the capabilities they learn. If the population is large enough, the learning curve always looks the same: small plateau, learning, long plateau. Is the second plateau forever, or could there be a second significant improvement in learning way later down the generations? Here again, no definitive answer, but I don't think so, as there are things that such a network can't learn.

I also tested with how the crossings happens, which had some impact but not as much as the population's size.

Evaluating good snakes is not that easy

The selection step of the genetic algorithm is based on a score. The best snakes are selected and the less good ones are eliminated. The best snake is the one with the longest body at the end of its game. Simple and easy.

Yes, simple and easy but not that great and before all not fair. It happened often that at the end of training, the best snake had a length of ca. 25 but did score only 7 in simulation run straight after training. The position of the food being random in every game, it means that each snake is trained in the same world but with different food positions.

Even worse, each snake is trained on different random food, but the food in the post-training run is also different than during the training. A snake could get "lucky" with the food placement during training and really unlucky when playing after training.

Doing multiple runs after the training shown that the same snake could get scores varying from 7 to 35, so almost a factor 5 between worst and best score. As a consequence, a snake developing capabilities matching a given drawing of food will perform really well but only once.

To try to solve this problem, I made multiple runs at each generation, to try to give an advantage to snakes with generic capabilities. Obviously, this came at the price of more runs, more power, more time.

Who said this is a simple problem?

This game, which seem so obvious for a human, is definitely not that easy to solve with AI without using complex, pre-made frameworks which I could apply without understanding them.

The possible next steps are the following:

  • Change programming language, either Typescript for at least type safety or possibly python, which I'm not so familiar with, but this could be an interesting learning opportunity
  • Trying other kind of neurons and neural networks, possibly beyond dense linear networks
  • Trying again with other inputs, may be at "lower" level, like sending all of the world into the network - which is the most brutal and raw approach but did not work so far

Vincent Lambercy

Related posts

Search Snakes and bare hands AI - Part 1?
A bit of fun from ChatGPT and DALL-E Search