The Cracker algorithm for mining connected components in large graphs

By admin | March 16, 2015

The problem of finding connected components in a graph is common to several applications dealing with graph analytics, such as social network analysis, web graph mining and image processing. The exponentially growing size of graphs requires the definition of appropriated computational models and algorithms for their processing on high throughput distributed architectures.

The basic idea of cracker is to iteratively reduce the graph size by progressively pruning vertices until only one vertex for each connected components (CC) is left, i.e., its seed. For instance, an isolated vertex can be immediately excluded from further processing, since it does not provide useful information for the discovery of the other CCs. Similarly, as soon as a cc is identified, it can be excluded from computation since it does not impact on the other CCs in the graph. When a vertex is removed, it contributes to iteratively build a spanning tree that, at the end of the algorithm, will exist a spanning tree for each connected component.

The wide spectrum of datasets analysed reveals that the pruning mechanism is effective and lets cracker to outperform the existing state-of-the-art solutions. cracker revealed to be a very flexible solution, resulting the most efficient solution both when the largest cc is near the dimension of the entire graph and when it is of smaller size. In terms of time, Cracker outperforms its competitors in every dataset used, with the best competitor being from 9% to 75% slower. Finally, cracker obtained the least message volume among all the competitors.

Our paper Carlini, E., Dazzi, P., Lulli, A., Lucchese, C., and Ricci, L. “Cracker: Crumbling Large Graphs Into Connected Components” was accepted at the 20th IEEE Symposium on Computers and Communications, Larnaca, Cyprus, 6–9 July 2015.

Topics: Uncategorized | No Comments »


You must be logged in to post a comment.