As is the case in game programming, my first performance roadblock that is really hurting the Alpha is the pathfinding. It uses A* and it is slow once you get to 40+ square length paths. By slow I mean you can see it takes on the order of 100s of ms for just a few units.

That's bad.

So I went ahead and added Jump Point Search, JPS, but interestingly it made my pathfinding EVEN SLOWER! The issue is that JPS is only superior to super vanilla blind implementation of A* without even, I guess, a heap to store the open list?

Jump Point Pain

Okay, JPS isn't just awesome by blindly implementing the algorithm as pointed out by Harabor (here's the white paper). But it does add value, if we can figure out what is slowing everything down then we can get what it promises: less nodes to check in the open list which is an O(logn) insert.

First, why is A* faster than A* + vanilla JPS? The issue comes down to how the open list is processed. A* with a simple heuristic ( c^2 = a^2 + b^2 ) gives us a very accurate way of determining the next path to analyze without over-estimation. Additionally, the modified A* in Cultura "prefers" to check paths of longer length between two equal cost (known + estimated cost) paths. This means it will traverse down something that is likely to hit the goal node first and end the search. In practice, this means racing down the correct path almost immediately and finding the goal node. So the costs are a bunch of O(logn) inserts and O(1) pops to get the head.

This is quick but the inserts eventually slow down the pathfinding to a very mediocre speed by even a 40-length path.

Thus begins our quest to optimize the shit out of pathfinding.

Cost Estimation Heuristic

The first and simplest optimization was to improve the cost estimation. Originally it was using the "squared" straight line distance to avoid performing a square root. This avoids the computationally expensive square root in the hopes that it doesn't damage performance.

Unfortunately, that just wasn't true. The more accurate straight line distance versus the square straight line distance was pretty much a different in runtime complexity on path lengths over a 10-step length. This was due to the poor "next node" selection in the A* open list. Doing the full calculation allowed A* to correctly choose the next node without trouble and thus be nearly linear in all typical use cases in Cultura.

These days you can use really awesome square root calculators that make it painless. Here are two things: the ole quick square root popularized by Carmack and the much newer asm_sqrt instruction. My box is a quad-core 3.6 Ghz i8 processor for reference.

Just so you know, here's fast inverse square root:

float CJumpPointSearch::Q_rsqrt( float number )
{
 long i;
 float x2, y;
 const float threehalfs = 1.5F;
 
 x2 = number * 0.5F;
 y  = number;
 i  = * ( long * ) &y;                       // evil floating point bit level hacking
 i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
 y  = * ( float * ) &i;
 y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//      y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed
 
 return y;
}

Method	Time
Quick Square Root	~25 000 000/s
asm_sqrt	~30 000 000/s

I can do thirty million sqrts a second. Or I can choose the wrong node eighty times and spend countless milliseconds hitting the A* open list. HMM!

JPS: Straight Line Jumps

A straight line is pretty simple. The JPS algorithm basically says keep going forward unless there are obstructions. Then you stop. Seems simple right? Well actually there's a lot of checks per square you need to do before you "jump" to the next square. Is the next square an obstruction or within the map? Is the next square the goal node? Are there any "forced neighbours"? Okay, NOW you jump. The number of checks is very significant.

That's not the worst part though. An unstated problem in JPS is that the jumping doesn't stop.

It doesn't look so bad here. We checked seven squares and then determined that this has no jump point successors. Therefore nothing gets added to the A* open list. Fantastic right? Except that in non-vanilla A* with a reasonable heuristic, I would have never checked all but one of those nodes. So what would have been the cost? Nearly nothing. But, with vanilla JPS, I've run through a "jump" check for all seven nodes. And it gets worse with map size. Cultura has very large maps but even something like a 90x90 test map, I'm nearly running 60-square checks for every straight line jump.

These checks can't be skiped. In order to preserve correctness we must do every square. However, you may not need to perform a full jump.

JPS: Diagonal Line Jumps

In JPS, a diagonal line jump is the most frequent kind of jump you want to do because very rarely do you have a straight path to your goal node. However, the JPS algorithm does an extra step to ensure correctness/optimality for diagonal jumps. Every square you jump forward you must also check the straight line jumps that come out of it.

These straight line jumps have all the performance issues as a normal straight line jump because they're the same thing. And notice they can't be skipped otherwise you might miss an optimal path. So for every square a diagonal jump can go we add two straight line jumps. The number of checks grows very rapidly. For a map such as 90x90, each straight line jump can be 50-60 iterations of the jump function each one consisting of 3-5 checks. Now imagine a reasonable map in Cultura, like a 500x500 map or a very large one like 2500x2500, a straight line jump can be 10000s of checks. So a diagonal line ends up being hundreds of thousands to millions of checks versus just a few in A* with a heap sorted open list. JPS becomes insanely slower.

Jump Point Optimizations

What do?

Order of Operation

Let's consider a few situations:

We could have a goal node accessible through a straight line path. We could have one accessible through a diagonal path. And we could have one accessible by neither.

The first step in JPS is to "prune neighbours" before deciding to figure out the jump point successors. That means we're trying to figure out, from the current node that we just pulled from the A* open list, which directions we should attempt to jump. We can immediately identify that if we're in a straight line path (the x or y value is the same as the current node) then we should see if a straight line jump works. Therefore we should prefer to check the straight line first. Otherwise, let's do the diagonal first because the straight line can't possibly hit the goal node; only a diagonal can. Obstructions can change this but the algorithm already accounts for this.

In practice this means using a deque to store the "pruned neighbours". We push front for the jumps we want to check first, push back to those we want to check second. We are preferring to look at jumps that have a chance of hitting the goal node first.

Of course this is only useful with an early exit. If we find a path that hits the goal node we are using the properties of A* and JPS to exit early from the jumping. We have found a path to the goal. Exit now and let the A* handle whether that is optimal or not. If not? It won't be analyzed again. If it is, we found a path and we can return it immediately. Due to the properties of A*, any other goal path can only at best be of equal length.

Delayed Diagonal Checks

A diagonal path needs to make a straight line check every step of the way. But, if the diagonal path can just simply take you to the goal node (because there were no obstructions), the question is: who gives a shit? The answer is nobody. You got an optimal path, you have an optimal path. Use it!

Rather than make a straight line check every step of the way, as stated in the original vanilla JPS algorithm, just delay those checks only when the diagonal path fails to yield a path.

Overrunning the Goal and A*

When you're using a heap for the A* open list, an insert is O(logn) and popping is O(1). Is that expensive? No! But it is more expensive if you didn't have to do it. JPS is meant to reduce the number of open list checks but that's not useful if the "jump" function takes so ****ing long that the heap inserts and pops were cheaper. What's the answer? Leverage both.

Whenever a jump overruns the goal node, just exit out and return that as a jump point successor. Let the A* algorithm figure out whether it should bother to keep going. The cost is cheap. Imagine a 2500x2500 map. You could do a million checks before realizing this jump is not useful. But A* would have known that far earlier, skipping so many checks that it was much much cheaper to have just simply inserted this into the open list. It's a heap, big deal. It'd have to grow to enormous sizes in order to be a problem.

Have a Path, Use the Path

Whenever we find that we have a path with a goal node, we immediately exit out of finding further jump point successors. The properties of A* will guarantee that we won't use this path if it is not optimal. But we can't tell if it is optimal without going to A*'s algorithm. This avoids the need in JPS to do pointless diagonal or straight line jumps, which again, have a significant cost to them. Inserting a potential goal node into A*'s open list and immediately checking its validity saves significant computation time (a single logn insert versus potentially 1000s of jump checks on various squares).

What's Next

Right now we're running at around 120 solutions for ~60 length paths in a 90x90 map per second. For the pathological case of thousands of units operating at the same time attempting to do pathfinding we would want something around 500-600 solutions per second. We can pursue a few other options: precomputation, caching calculating paths, using greedy whenever possible, reducing function calls and other super-optimization of code (right now the code is very pretty and laid out in OO manner).

I'll be taking a two week break from coding this so that is where I leave the pathfinding state in right now. This is a working real world example of using A* and JPS. I'll likely post up the code (as useful as that may be for people making games with completely different data structures, passability rules and so on), when I'm at the efficiency I'd like to be at for the pathfinder.

Cultura Dev Journal

Sunday, August 24, 2014

My A-Star is Too Slow, Now With Jump Point Search