Uniform Edge Sampling from Complete k-Partite Graphs

This week I implemented my own version of graph2gauss, a deep learning model for node embeddings, in PyTorch. At its core is a loss function with the following structure.

\[\sum_{i = 0}^n \sum_{\scriptstyle j, k \atop\scriptstyle \operatorname{sp}(i, j) < \operatorname{sp}(i, k)} f(i, j, k)\]

Here the inner sum iterates over all possible pairs of nodes that have a different shortest-path distance to the node \(i\). In the paper, the authors derive the Monte-Carlo approximation

\[\sum_{i = 0}^n \mathbb{E}_{j_1, \dots, j_K \sim N_{i, 1}, \dots, N_{i, K}} \sum_{\scriptstyle 1 \le k, l \le K \atop\scriptstyle k < l} \left| N_{i, j_k} \right| \cdot \left| N_{i, j_l} \right| \cdot f(i, j_k, j_l)\]

which leverages the runtime efficiency of independent, uniform sampling from node subsets but is a bit unwieldy as far as I am concerned. I would much rather write it as

\[\sum_{i = 0}^n \left|N_i\right| \cdot \mathbb{E}_{j, k}\, f(i, j, k)\]

where \(N_i = \{ (j, k) \mid \operatorname{sp}(i, j) < \operatorname{sp}(i, k) \}\) is the set of all admissible pairs and the expected value considers a uniform distribution over \(N_i\). At a first glance sampling from \(N_i\) might be a problem because \(|N_i|\) grows quadratically in the number of reachable nodes from \(i\). On second thought \(N_i\) can be seen as the edge set of a complete multi-partite graph of all neighbors of \(i\) where the nodes are partitioned on \(\operatorname{sp}(i, j)\). So the set actually has a simple form and uniform sampling should be possible without enumerating the set.

Now let \(G = (E, V)\) be a complete, undirected, \(k\)-partite graph with a set of nodes \(V\) that are partitioned into subsets \(V_1, \dots, V_k\) with cardinalities \(n_1, \dots, n_k\) and edges

\[E = \{ (u, v) \mid u \in V_a, v \in V_b, a \ne b \}.\]

So every node is connected to every other except the ones from its own partition. At first I thought that uniformly sampling \(u \in V_a\) from \(V\) and then \(v\) from \(V \setminus V_a\) could work because the probabilities might cancel out just right but the following example shows that that is not the case.

Figure 1: Uniform node sampling leads to non-uniform edge probabilities in this complete 3-partitioned graph

Figure 1: Uniform node sampling leads to non-uniform edge probabilities in this complete 3-partitioned graph

We see that edges between high-degree nodes receive too little probability weight. Therefore, a strategy that samples edges uniformly needs to assign higher probability to nodes that participate in many edges. Let \(d_u\) be the out-degree of node \(u\). In a graph such as \(G\), \(d_u = n - n_a\) where \(n = \sum_{i = 1}^k n_i\) and \(u \in V_a\). Choose the first node proportional to its out-degree and the second one uniformly from all nodes that share an edge with the first one, i.e.

\[p(u) \propto d_u \quad \textrm{and} \quad p(v \mid u) = \frac{1}{d_u}.\]

This leads to a uniform distribution over ordered pairs of nodes \((u, v)\)

\[p((u, v)) = p(u) \cdot p(v \mid u) = \frac{d_u}{C} \cdot \frac{1}{d_u} = \frac{1}{C}\]

where \(C\) is the normalization constant of \(p(u)\). But \(G\) is undirected and \((u, v)\) is the same edge as \((v, u)\). Hence the probability of an edge \(e = (u, v)\) is

\[p(e) = p((u, v)) + p((v, u)) = \frac{2}{C}.\]

Since \(p(e)\) is independent of \(e\), \(p(e)\) must of course be the uniform distribution but just to double-check we will compute \(C\). \(C = \sum_{j \in V} d_j\) is the sum of all out-degrees which would just be the number of edges in a directed version of \(G\). But \(G\) is undirected so \(C\) counts every edge twice. Therefore \(C = 2|E|\) and \(p(e) = \frac{1}{|E|}\).

The sampling method described in this post allowed me fine-grained control over the accuracy-performance trade-off in my implementation of graph2gauss. It is however generally applicable to any problem that can be modelled as sampling from a complete \(k\)-partite graph.