Uniform Edge Sampling from Complete k-Partite GraphsAug 29, 2019
This week I implemented my own version of graph2gauss, a deep learning model for node embeddings, in PyTorch. At its core is a loss function with the following structure.
Here the inner sum iterates over all possible pairs of nodes that have a different shortest-path distance to the node . In the paper, the authors derive the Monte-Carlo approximation
which leverages the runtime efficiency of independent, uniform sampling from node subsets but is a bit unwieldy as far as I am concerned. I would much rather write it as
where is the set of all admissible pairs and the expected value considers a uniform distribution over . At a first glance sampling from might be a problem because grows quadratically in the number of reachable nodes from . On second thought can be seen as the edge set of a complete multi-partite graph of all neighbors of where the nodes are partitioned on . So the set actually has a simple form and uniform sampling should be possible without enumerating the set.
Now let be a complete, undirected, -partite graph with a set of nodes that are partitioned into subsets with cardinalities and edges
So every node is connected to every other except the ones from its own partition. At first I thought that uniformly sampling from and then from could work because the probabilities might cancel out just right but the following example shows that that is not the case.
We see that edges between high-degree nodes receive too little probability weight. Therefore, a strategy that samples edges uniformly needs to assign higher probability to nodes that participate in many edges. Let be the out-degree of node . In a graph such as , where and . Choose the first node proportional to its out-degree and the second one uniformly from all nodes that share an edge with the first one, i.e.
This leads to a uniform distribution over ordered pairs of nodes
where is the normalization constant of . But is undirected and is the same edge as . Hence the probability of an edge is
Since is independent of , must of course be the uniform distribution but just to double-check we will compute . is the sum of all out-degrees which would just be the number of edges in a directed version of . But is undirected so counts every edge twice. Therefore and .
The sampling method described in this post allowed me fine-grained control over the accuracy-performance trade-off in my implementation of graph2gauss. It is however generally applicable to any problem that can be modelled as sampling from a complete -partite graph.