Playing with centrality measures
Toyed a bit with the Graph parts of BOOST C++. Very useful tool, I'll need to learn more of it. So far, I've revisited an older post from this blog. A brief recap: in some grid of 'workshops'/'research centers' that are making their way up a tech tree by discovery and/or knowledge transfer from their neighbors, workshops that are closer to the center of the grid/farther away from obstacles to knowledge transfer are more likely to reach the top of the tree first. For some general graph, not just a grid, what measure of centrality would capture this behaviour?
Since this was mostly about keeping things simple so as to make a few first steps with BOOST, I tried some simple metrics. One is "betweenness centrality": for a given node k, this is the ratio of the number of shortest paths, between all pairs of nodes, that pass through node k, over the number of shortest paths between all pairs of nodes. The other metric I cooked up was, let's call it shortest path potential, given a node k, finding the shortest path from it to all nodes in the graph, associating to each node a number equal to the chance of knowledge transfer to the power of shortest path length, then summing all these numbers. Results of these two metrics on a 'cross grid' graph (a 10x10 grid of nodes, but some connections are removed forming the cross-hair pattern) are below, together with the original plot of 'chance for first discovery'.
Of course, the similarity is rather coincidental, if it exists at all- betweenness centrality doesn't match 'chance of first discovery'. Discovering a tech tree is its own problem, and cannot be represented by either of my metrics. For one, the depth and breadth of the tech tree to explore is important to chance of first discovery. Deep and wide tech trees favor nodes with many neighbors. Shallow trees don't; in particular, if the tech tree contains only one technology, finding it first means finding it yourself and neighbors are irrelevant. Meanwhile, the shape of the tech tree is irrelevant to shortest path potential.
Still, there's some similarity between shortest path potential and the chance of first discovery. The intuition behind suggesting shortest path potential was that, if a piece of tech were discovered at some node j, it's most likely to get to a node k via the shortest path between them than by any other -one- path between them.
That assumption simplifies things a fair bit, and that's a second piece of difference between shortest path potential and chance of first discovery. It's consequence is that nodes that are at the same distance from a node k are all equally important to that node's efforts in exploring the tech tree. It's easy to see that in general this is false.
Consider a very simple case- four workshops in a line, A, B, C, D. A is connected to B (so they can transfer knowledge from each other), B is connected to A and C, C is connected to B and D and finally D is connected to C. A and C are both at distance 1 from B, but intuition suggests, and a fairly tedious computation will verify, that 'C' is a more important neighbor for B to have.
The reason is that B can steal from A only what A has previously discovered. On the other hand, B may steal from C what C has discovered, AND what C has previously stolen from D.
Verifying this means listing the various possible histories that result in a piece of tech first arriving at B through A, and a similar calculation for the chance of that tech first arriving through C. Just looking at the tree is enough to show that for every history that has a tech first arriving at B through A, there's a similar, and equally likely, history of that tech first arriving at B after passing through C. But there are some other histories that tilt the balance in C's favor.
Getting the numbers themselves however is fairly tedious, even for this simple case. So it would be nice if shortest path potential were good enough to predict chance of first discovery. Some more data is needed. Time to get back to BOOST, get some random graphs and try the tech tree exploration on them. Soon.
Since this was mostly about keeping things simple so as to make a few first steps with BOOST, I tried some simple metrics. One is "betweenness centrality": for a given node k, this is the ratio of the number of shortest paths, between all pairs of nodes, that pass through node k, over the number of shortest paths between all pairs of nodes. The other metric I cooked up was, let's call it shortest path potential, given a node k, finding the shortest path from it to all nodes in the graph, associating to each node a number equal to the chance of knowledge transfer to the power of shortest path length, then summing all these numbers. Results of these two metrics on a 'cross grid' graph (a 10x10 grid of nodes, but some connections are removed forming the cross-hair pattern) are below, together with the original plot of 'chance for first discovery'.
Betweenness centrality (left), shortest path potential (middle) and the original chance of first discovery |
Of course, the similarity is rather coincidental, if it exists at all- betweenness centrality doesn't match 'chance of first discovery'. Discovering a tech tree is its own problem, and cannot be represented by either of my metrics. For one, the depth and breadth of the tech tree to explore is important to chance of first discovery. Deep and wide tech trees favor nodes with many neighbors. Shallow trees don't; in particular, if the tech tree contains only one technology, finding it first means finding it yourself and neighbors are irrelevant. Meanwhile, the shape of the tech tree is irrelevant to shortest path potential.
Still, there's some similarity between shortest path potential and the chance of first discovery. The intuition behind suggesting shortest path potential was that, if a piece of tech were discovered at some node j, it's most likely to get to a node k via the shortest path between them than by any other -one- path between them.
That assumption simplifies things a fair bit, and that's a second piece of difference between shortest path potential and chance of first discovery. It's consequence is that nodes that are at the same distance from a node k are all equally important to that node's efforts in exploring the tech tree. It's easy to see that in general this is false.
Consider a very simple case- four workshops in a line, A, B, C, D. A is connected to B (so they can transfer knowledge from each other), B is connected to A and C, C is connected to B and D and finally D is connected to C. A and C are both at distance 1 from B, but intuition suggests, and a fairly tedious computation will verify, that 'C' is a more important neighbor for B to have.
The reason is that B can steal from A only what A has previously discovered. On the other hand, B may steal from C what C has discovered, AND what C has previously stolen from D.
Verifying this means listing the various possible histories that result in a piece of tech first arriving at B through A, and a similar calculation for the chance of that tech first arriving through C. Just looking at the tree is enough to show that for every history that has a tech first arriving at B through A, there's a similar, and equally likely, history of that tech first arriving at B after passing through C. But there are some other histories that tilt the balance in C's favor.
Getting the numbers themselves however is fairly tedious, even for this simple case. So it would be nice if shortest path potential were good enough to predict chance of first discovery. Some more data is needed. Time to get back to BOOST, get some random graphs and try the tech tree exploration on them. Soon.
Comments
Post a Comment