A research article co-authored by a Two Sigma Labs researcher was recently accepted as a full paper at the 22nd ACM International Conference on Knowledge Discovery and Data Mining (ACM KDD2016), the main venue for novel research in data mining and data science. The paper describes a new suite of algorithms for approximating betweenness centrality on static and fully dynamic graphs.
Centrality measures are fundamental concepts in graph analysis: they assign to each node or edge in the network a score that quantifies some notion of importance of the node/edge in the network. Betweenness centrality is a very popular centrality measure that, informally, defines the importance of a node or edge z in the network as proportional to the fraction of shortest paths in the network that go through z.
While there are polynomial time algorithms to compute the exact betweenness centrality of all nodes in a graph, these are too slow to be practical on graphs of even moderate size (e.g., tens of thousands of nodes). Moreover, computing the exact betweenness values may often not be needed, given the exploratory nature of the task, and a high-quality approximation of the values is usually sufficient, provided it comes with stringent guarantees.
Today’s networks are not only large, but also dynamic; edges are added and removed continuously. Keeping the betweenness values up to date after edge insertions and removals is a challenging task of dubious merit. Maintaining a high-quality approximation instead is more feasible and more sensible.
Using ABRA to compute and maintain high-quality betweenness approximations
With this goal in mind, Two Sigma Labs research scientist Matteo Riondato and Brown University Professor Eli Upfal have developed ABRA, a suite of algorithms to compute and maintain probabilistically guaranteed, high-quality approximations of the betweenness centrality of all nodes (or edges) on both static and fully dynamic graphs. The algorithms use progressive random sampling, and their analyses rely on Rademacher averages and pseudodimension, fundamental concepts from statistical learning theory. The experimental results show that ABRA is much faster than exact methods, and vastly outperforms, in both runtime and number of samples, state-of-the-art algorithms with the same quality guarantees.