This week I implemented my own version of graph2gauss, a deep learning model for node embeddings, in PyTorch. At its core is a loss function with the following structure.
$$\sum_{i = 0}^n \sum_{\scriptstyle j, k \atop\scriptstyle \sp(i, j) < \sp(i, k)} f(i, j, k)$$
Here the inner sum iterates over all possible pairs of nodes that have a different shortest-path distance to the node $i$. In the paper, the authors derive the Monte-Carlo approximation
$$\sum_{i = 0}^n \mathbb{E}_{j_1, \dots, j_K \sim N_{i, 1}, \dots, N_{i, K}} \sum_{\scriptstyle 1 \le k, l \le K \atop\scriptstyle k < l} \left| N_{i, j_k} \right| \cdot \left| N_{i, j_l} \right| \cdot f(i, j_k, j_l)$$which leverages the runtime efficiency of independent, uniform sampling from node subsets but is a bit unwieldy as far as I am concerned. I would much rather write it as
$$\sum_{i = 0}^n \left|N_i\right| \cdot \mathbb{E}_{j, k}, f(i, j, k)$$
where $N_i = { (j, k) \mid \sp(i, j) < \sp(i, k) }$ is the set of all admissible pairs and the expected value considers a uniform distribution over $N_i$. At a first glance sampling from $N_i$ might be a problem because $|N_i|$ grows quadratically in the number of reachable nodes from $i$. On second thought $N_i$ can be seen as the edge set of a complete multi-partite graph of all neighbors of $i$ where the nodes are partitioned on $\sp(i, j)$. So the set actually has a simple form and uniform sampling should be possible without enumerating the set.
Now let $G = (E, V)$ be a complete, undirected, $k$-partite graph with a set of nodes $V$ that are partitioned into subsets $V_1, \dots, V_k$ with cardinalities $n_1, \dots, n_k$ and edges
$$E = { (u, v) \mid u \in V_a, v \in V_b, a \ne b }.$$
So every node is connected to every other except the ones from its own partition. At first I thought that uniformly sampling $u \in V_a$ from $V$ and then $v$ from $V \setminus V_a$ could work because the probabilities might cancel out just right but the following example shows that that is not the case.
We see that edges between high-degree nodes receive too little probability weight. Therefore, a strategy that samples edges uniformly needs to assign higher probability to nodes that participate in many edges. Let $d_u$ be the out-degree of node $u$. In a graph such as $G$, $d_u = n - n_a$ where $n = \sum_{i = 1}^k n_i$ and $u \in V_a$. Choose the first node proportional to its out-degree and the second one uniformly from all nodes that share an edge with the first one, i.e.
$$p(u) \propto d_u \quad \textrm{and} \quad p(v \mid u) = \frac{1}{d_u}.$$
This leads to a uniform distribution over ordered pairs of nodes $(u, v)$
$$p((u, v)) = p(u) \cdot p(v \mid u) = \frac{d_u}{C} \cdot \frac{1}{d_u} = \frac{1}{C}$$
where $C$ is the normalization constant of $p(u)$. But $G$ is undirected and $(u, v)$ is the same edge as $(v, u)$. Hence the probability of an edge $e = (u, v)$ is
$$p(e) = p((u, v)) + p((v, u)) = \frac{2}{C}.$$
Since $p(e)$ is independent of $e$, $p(e)$ must of course be the uniform distribution but just to double-check we will compute $C$. $C = \sum_{j \in V} d_j$ is the sum of all out-degrees which would just be the number of edges in a directed version of $G$. But $G$ is undirected so $C$ counts every edge twice. Therefore $C = 2|E|$ and $p(e) = \frac{1}{|E|}$.
The sampling method described in this post allowed me fine-grained control over the accuracy-performance trade-off in my implementation of graph2gauss. It is however generally applicable to any problem that can be modelled as sampling from a complete $k$-partite graph.