Solving large-scale optimization problems often starts with graph partitioning, which means partitioning the vertices of the graph into clusters to be processed on different machines. The need to make sure that clusters are of near equal size gives rise to the

*balanced graph partitioning problem*. In simple terms, we need to partition the vertices of a given graph into k almost equal clusters, while we minimize the number of edges that are cut by the partition. This NP-hard problem is notoriously difficult in practice because the best approximation algorithms for small instances rely on semidefinite programming which is impractical for larger instances.

This post presents the distributed algorithm we developed which is more applicable to large instances. We introduced this balanced graph-partitioning algorithm in our WSDM 2016 paper, and have applied this approach to several applications within Google. Our more recent NIPS 2017 paper provides more details of the algorithm via a theoretical and empirical study.

**Balanced Partitioning via Linear Embedding**

Our algorithm first embeds vertices of the graph onto a line, and then processes vertices in a distributed manner guided by the linear embedding order. We examine various ways to find the initial embedding, and apply four different techniques (such as local swaps and dynamic programming) to obtain the final partition. The best initial embedding is based on “affinity clustering”.

**Affinity Hierarchical Clustering**

Affinity clustering is an agglomerative hierarchical graph clustering based on Borůvka’s classic Maximum-cost Spanning Tree algorithm. As discussed above, this algorithm is a critical part of our balanced partitioning tool. The algorithm starts by placing each vertex in a cluster of its own: v0, v1, and so on. Then, in each iteration, the highest-cost edge out of each cluster is selected in order to induce larger merged clusters: A

_{0}, A

_{1}, A

_{2}, etc. in the first round and B

_{0}, B

_{1}, etc. in the second round and so on. The set of merges naturally produces a hierarchical clustering, and gives rise to a linear ordering of the leaf vertices (vertices with degree one). The image below demonstrates this, with the numbers at the bottom corresponding to the ordering of the vertices.

Our NIPS’17 paper explains how we run affinity clustering efficiently in the massively parallel computation (MPC) model, in particular using distributed hash tables (DHTs) to significantly reduce running time. This paper also presents a theoretical study of the algorithm. We report clustering results for graphs with tens of trillions of edges, and also observe that affinity clustering empirically beats other clustering algorithms such as k-means in terms of “quality of the clusters”. This video contains a summary of the result and explains how this parallel algorithm may produce higher-quality clusters even compared to a sequential single-linkage agglomerative algorithm.

**Comparison to Previous Work**

In comparing our algorithm to previous work in (distributed) balanced graph partitioning, we focus on FENNEL, Spinner, METIS, and a recent label propagation-based algorithm. We report results on several public social networks as well as a large private map graph. For a Twitter followership graph, we see a consistent improvement of 15–25% over previous results (Ugander and Backstrom, 2013), and for LiveJournal graph, our algorithm outperforms all the others for all cases except k = 2, where ours is slightly worse than FENNEL's.

The following table presents the fraction of cut edges in the Twitter graph obtained via different algorithms for various values of k, the number of clusters. The numbers given in parentheses denote the size imbalance factor: i.e., the relative difference of the sizes of largest and smallest clusters. Here “Vanilla Affinity Clustering” denotes the first stage of our algorithm where only the hierarchical clustering is built and no further processing is performed on the cuts. Notice that this is already as good as the best previous work (shown in the first two columns below), cutting a smaller fraction of edges while achieving a perfect (and thus better) balance (i.e., 0% imbalance). The last column in the table includes the final result of our algorithm with the post-processing.

k | UB13(5%) | Spinner(5%) | Vanilla AffinityClustering (0%) | Final Algorithm(0%) |

20 | 37.0% | 38.0% | 35.71% | 27.50% |

40 | 43.0% | 40.0% | 40.83% | 33.71% |

60 | 46.0% | 43.0% | 43.03% | 36.65% |

80 | 47.5% | 44.0% | 43.27% | 38.65% |

100 | 49.0% | 46.0% | 45.05% | 41.53% |

**Applications**

We apply balanced graph partitioning to multiple applications including Google Maps driving directions, the serving backend for web search, and finding treatment groups for experimental design. For example, in Google Maps the World map graph is stored in several shards. The navigational queries spanning multiple shards are substantially more expensive than those handled within a shard. Using the methods described in our paper, we can reduce 21% of cross-shard queries by increasing the shard imbalance factor from 0% to 10%. As discussed in our paper, live experiments on real traffic show that the number of multi-shard queries from our cut-optimization techniques is 40% less compared to a baseline Hilbert embedding technique. This, in turn, results in less CPU usage in response to queries. In a future blog post, we will talk about application of this work in the web search serving backend, where balanced partitioning helped us design a cache-aware load balancing system that dramatically reduced our cache miss rate.

**Acknowledgements**

*We especially thank Vahab Mirrokni whose guidance and technical contribution were instrumental in developing these algorithms and writing this post. We also thank our other co-authors and colleagues for their contributions: Raimondas Kiveris, Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Silvio Lattanzi, Aaron Archer and other members of NYC Algorithms and Optimization research team.*