This week I implemented my own version of graph2gauss, a deep learning
model for node embeddings, in PyTorch. At its core is a loss function with the
following structure.
i=0∑nsp(i,j)<sp(i,k)j,k∑f(i,j,k)Here the inner sum iterates over all possible pairs of nodes that have a different
shortest-path distance to the node i. In the paper, the authors derive the
Monte-Carlo approximation
i=0∑nEj1,…,jK∼Ni,1,…,Ni,Kk<l1≤k,l≤K∑∣Ni,jk∣⋅∣Ni,jl∣⋅f(i,jk,jl)which leverages the runtime efficiency of independent, uniform sampling from
node subsets but is a bit unwieldy as far as I am concerned. I would much rather
write it as
i=0∑n∣Ni∣⋅Ej,kf(i,j,k)where Ni={(j,k)∣sp(i,j)<sp(i,k)} is the set of all admissible
pairs and the expected value considers a uniform distribution over Ni. At a first
glance sampling from Ni might be a problem because ∣Ni∣ grows
quadratically in the number of reachable nodes from i. On second thought
Ni can be seen as the edge set of a complete multi-partite graph of all neighbors
of i where the nodes are partitioned on sp(i,j). So the set actually has
a simple form and uniform sampling should be possible without enumerating the set.
Now let G=(E,V) be a complete, undirected, k-partite graph with a set
of nodes V that are partitioned into subsets V1,…,Vk with
cardinalities n1,…,nk and edges
E={(u,v)∣u∈Va,v∈Vb,a=b}.So every node is connected to every other except the ones from its own partition. At first
I thought that uniformly sampling u∈Va from V and then v from
V∖Va could work because the probabilities might cancel out just right
but the following example shows that that is not the case.
We see that edges between high-degree nodes receive too little probability weight.
Therefore, a strategy that samples edges uniformly needs to assign higher probability to
nodes that participate in many edges. Let du be the out-degree of node u.
In a graph such as G, du=n−na where n=∑i=1kni
and u∈Va. Choose the first node proportional to its out-degree and the second
one uniformly from all nodes that share an edge with the first one, i.e.
p(u)∝duandp(v∣u)=du1.This leads to a uniform distribution over ordered pairs of nodes (u,v)
p((u,v))=p(u)⋅p(v∣u)=Cdu⋅du1=C1where C is the normalization constant of p(u). But G is undirected
and (u,v) is the same edge as (v,u). Hence the probability of an edge
e=(u,v) is
p(e)=p((u,v))+p((v,u))=C2.Since p(e) is independent of e, p(e) must of course be the uniform
distribution but just to double-check we will compute C. C=∑j∈Vdj is the sum of all out-degrees which would just be the number of edges in a directed
version of G. But G is undirected so C counts every edge twice.
Therefore C=2∣E∣ and p(e)=∣E∣1.
The sampling method described in this post allowed me fine-grained control over the
accuracy-performance trade-off in my implementation of graph2gauss. It is however
generally applicable to any problem that can be modelled as sampling from a complete
k-partite graph.