GPT: different nodes in a graph neural network converge to similar representations, losing their distinctive features, as the amount of graph convolution or message passing layers increases
GPT:
it happens cause:
Iterative Aggregation: each new layer does another round of message passing, which smooths features across nodes
Homophily vs Heterophily:
similar nodes are more likely to be connected (homophily), in this case, oversmoothing can actually be beneficial
However, in graphs where dissimilar nodes are connected (heterophily), oversmoothing will lead to loss of these heterogenous aspects.
Normalization Issue:
normilization can amplify the oversmoothing problem by making the nodes’ representations more similar after several convolutional layers
Strategies to combat it include early stopping, using models with residual or skip connections, and dynamic rational activation functions.
Message-passing with different-class nodes homogenizes their representations exponentially.
Message-passing with nodes that have not been encountered before causes the denoising effect, and the magnitude depends on the absolute number of newly encountered neighbors.
The diameter of the graph is at most logN/log(logN) in our case.
After the number of layers surpasses the diameter, for each node, there will be no nodes that have not been encountered before in message-passing and hence the denoising effect will almost vanish. This is why even in a large graph, the mixing effect will quickly dominate the denoising effect when we increase the number of layers, and so oversmoothing is expected to happen at a shallow depth.