One million pathfinding calls

When we’re making video games, we often need a navigation system that relies most of the time on a pathfinding algorithm called A* (A star). It could be to move a character from a point A to a point B or to check if a path exists between two points when you are making procedural generation.

At the beginning of my first year at the Enjmin, we had a project whose the mission was to create a small game in C++ using the Windows command prompt.

The beautiful Windows command prompt.

I teamed up with a friend who was working on procedural generation and genetic algorithms (you can find his portfolio here) and we decided to make a puzzle game called The Wetness paying tribute to the The Witness. The objective of the game was simple. You have to reach the purple square by taking all green squares without passing twice at the same place.

The Wetness

The particularity of this game is that all puzzles are procedurally generated at runtime. A machine learning and a genetic algorithm are working together to create interesting puzzles and dynamically adapt the difficulty in order to keep the player in the flow. Once a puzzle is generated we have to evaluate it and check that it is solvable. That’s the main challenge we faced. Checking if there’s at least one possible path that respects the rules was actually complicated because the algorithmic complexity was growing up very quickly and making the player wait 1 minute between each puzzle wasn’t an option. We had to reduce the algorithmic complexity and optimize our code. The profiler was clear, the CPU spend all his time in the pathfinding.

The goal of this article is not to explain how A* works but to present the optimization we used to make our pathfinding faster within the time we had. Our first naive implementation stored the nodes coordinates and priorities with int. That was our first mistake. What’s the point of using a 4 bytes variable to store the coordinates of a small square grid (10×10 maximum) ? Let’s define our class node :

First of all, all template are prefixed by T and the code will be in the namespace nav to avoid name conflicts because Node or SquareGrid are very generic names. To make things more simple, the type of the coordinates and the type of the node priority will be passed through template parameters to be able to customize our pathfinding to fit with our project requirements. In our case, a node can be linked to its 4 direct neighbors : North, East, South and West. To save a bit of memory, these links are stored in one byte thanks to the flags.

The using statement is necessary to simplify the heavy use of templates. Note that the coordinates are stored in an union between two distinct coordinates, x and y or only one number that represents the two coordinates.

And what about this line ?

If the coordinate type is char, it means that x and y will be respectively encoded in a char and if we want to hold 2 chars in one variable we need at least a short. Two shorts will fit into an int and two ints will fit into a long long (since C++11). The following example shows the situation :

To deduce the type of xy, a simple template specialization of the structure TTieTypeHelper is used :

The two next functors will be useful to use our template with STL containers. We have our node template so let’s set up the graph.

The template TSquareGrid represents an oriented graph. It stores its size and a 2D array of nodes flattened into a 1D array. It’s more efficient for cache because if we store our nodes in a vector of vectors of nodes, the nodes will be separated in many different areas of the memory and it will result into a bunch of cache misses. We can use now our graph and node template with a new template : The pathfinding.

This template introduces a new template parameter, the graph, to be able to change the input graph type. During our tests, we have implemented a Jump Point Search (JPS) pathfinding and we have compared the results with the classic implementation of A*. JPS was about 1.5 times slower because in JPS, the GetNeighbors method is slower but it eliminates much more nodes in one time. However our grid was small and narrow that’s why it didn’t fit very well with JPS.

The result ?

Thanks to our templates, using the type char for the coordinates and the type short for the priority makes the pathfinding almost 2 times faster than using int type. It results in fewer memory allocations, faster copies, faster computations, and a better use of the cache lines and caches. Storing the neighbors in one single byte helped a lot too (instead of holding 4 ints or a vector of bool like I can see sometimes). A lot of work has been done to avoid copies or useless temporary objects by using references and the move semantic. With all of these optimizations, the game is running fine. We are able to generate 10×10 grids in less than 3 seconds even there are sometimes more than 1 million calls to the pathfinding method for only one puzzle.

Ironically at the end of the project, the CPU spent the major part of his time into the STL containers (maps, vectors and the priority queue) although they were properly used. The next optimization step would have been to define custom containers and use a custom memory allocator such as a simple stack allocator.

You can find the complete source code here on my GitHub in the repository Articles.

Thanks for reading !

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © 2018 Vincent STEHLY--CALISTO. All Rights Reserved.

Up ↑