One little detail of this method that I felt was worth commenting on was the creation of this graph. The "lock nodes" need to have some basic properties. There should be one on the path between the root and any leaf. It could be the leaf. (In theory it could be the root, but that would suck because there would be no parallelism in the collisions.) We would also like to use nodes that are as far down the tree as possible without making the graph too dense. The edges in the graph connect nodes that are sufficiently close that it isn't safe to do collisions in both at the same time. So we want nodes that have a size on the order of the search distance for collision pairs. Identifying these nodes isn't that hard. It can be done with a nice recursive decent. Once we have them we would have to build the graph which could be done easily in code by doing a recursive decent for each of the lock nodes. Such an approach is easy to describe and would be O(n log n) with a coefficient that isn't huge so it wouldn't be the limiting factor in the simulation. However, there is a better way.
This better way, as the title of this post alludes to, is to do recursion over two arguments. In this way, we can identify the lock nodes and build the graph all in a single recursive call. However, that recursive call now recurses on two arguments and basically runs through pairs of nodes. The advantage is that it prevents it from doing work on the top potion of the tree multiple times and should provide something close to that absolute minimum work needed to identify all of the pairings needed for the graph.
This isn't the first time I have used this type of method. In fact, it is the fourth time and the third time in this particular simulation code. I first used the method for my implementation of a symplectic tree code that I have worked on with Hal Levison. In that code the interactions occur only between nodes on the same level of the tree. As such, the two argument recursion makes great sense. I had never seen it used before and still haven't, so for all I know, it hasn't been widely discovered as an approach for numerical work.
My second use was in the collision finding in my main simulation code when a tree is used for finding the collisions. More recently I updated gravity to use it as well. In the case of gravity, switching to this approach dropped the number of distance calculations and improved code performance by a factor of 2-3 times. Now it looks like I'll be using it a fourth time to find pairs of lock nodes that should have edges between them in the gravity/collision tree.
As a general method, what this approach does is to efficiently run through and identify pairs of nodes in a tree when there is a criteria that can eliminate subtrees from consideration. In my work, the criteria is a distance issue, but it could be other things. This could also be generalized to recursion over three or more arguments for finding triples, etc. What makes it efficient is that the recursion doesn't pass over the top level nodes doing the check for inclusion multiple times. Instead, it goes through them once for each argument that it is recursing on and then goes on down to other pairs where the check is more significant.
(Interested readers might wonder how one could make a single recursive call like this operate in parallel. If anyone requests a description of how we have found to do that in a way that leads to generally good load balancing, I'd be happy to post it at that time.)