ContraNorm: A Contrastive Learning Perspective on Oversmoothing and
Beyond
- URL: http://arxiv.org/abs/2303.06562v2
- Date: Tue, 2 May 2023 13:38:34 GMT
- Title: ContraNorm: A Contrastive Learning Perspective on Oversmoothing and
Beyond
- Authors: Xiaojun Guo, Yifei Wang, Tianqi Du, Yisen Wang
- Abstract summary: Oversmoothing is a common phenomenon in a wide range of Graph Neural Networks (GNNs) and Transformers.
We propose a novel normalization layer called ContraNorm, which implicitly shatters representations in the embedding space.
Our proposed normalization layer can be easily integrated into GNNs and Transformers with negligible parameter overhead.
- Score: 13.888935924826903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Oversmoothing is a common phenomenon in a wide range of Graph Neural Networks
(GNNs) and Transformers, where performance worsens as the number of layers
increases. Instead of characterizing oversmoothing from the view of complete
collapse in which representations converge to a single point, we dive into a
more general perspective of dimensional collapse in which representations lie
in a narrow cone. Accordingly, inspired by the effectiveness of contrastive
learning in preventing dimensional collapse, we propose a novel normalization
layer called ContraNorm. Intuitively, ContraNorm implicitly shatters
representations in the embedding space, leading to a more uniform distribution
and a slighter dimensional collapse. On the theoretical analysis, we prove that
ContraNorm can alleviate both complete collapse and dimensional collapse under
certain conditions. Our proposed normalization layer can be easily integrated
into GNNs and Transformers with negligible parameter overhead. Experiments on
various real-world datasets demonstrate the effectiveness of our proposed
ContraNorm. Our implementation is available at
https://github.com/PKU-ML/ContraNorm.
Related papers
- Residual Connections and Normalization Can Provably Prevent Oversmoothing in GNNs [30.003409099607204]
We provide a formal and precise characterization of (linearized) graph neural networks (GNNs) with residual connections and normalization layers.
We show that the centering step of a normalization layer alters the graph signal in message-passing in such a way that relevant information can become harder to extract.
We introduce a novel, principled normalization layer called GraphNormv2 in which the centering step is learned such that it does not distort the original graph signal in an undesirable way.
arXiv Detail & Related papers (2024-06-05T06:53:16Z) - Alignment and Outer Shell Isotropy for Hyperbolic Graph Contrastive
Learning [69.6810940330906]
We propose a novel contrastive learning framework to learn high-quality graph embedding.
Specifically, we design the alignment metric that effectively captures the hierarchical data-invariant information.
We show that in the hyperbolic space one has to address the leaf- and height-level uniformity which are related to properties of trees.
arXiv Detail & Related papers (2023-10-27T15:31:42Z) - Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural
Networks [4.213427823201119]
Our study reveals new theoretical insights into over-smoothing and feature over-correlation in deep graph neural networks.
We show the prevalence of invariant subspaces, demonstrating a fixed relative behavior unaffected by feature transformations.
We empirically extend our insights to the non-linear case, demonstrating the inability of existing models to capture linearly independent features.
arXiv Detail & Related papers (2023-08-31T15:22:31Z) - OrthoReg: Improving Graph-regularized MLPs via Orthogonality
Regularization [66.30021126251725]
Graph Neural Networks (GNNs) are currently dominating in modeling graphstructure data.
Graph-regularized networks (GR-MLPs) implicitly inject the graph structure information into model weights, while their performance can hardly match that of GNNs in most tasks.
We show that GR-MLPs suffer from dimensional collapse, a phenomenon in which the largest a few eigenvalues dominate the embedding space.
We propose OrthoReg, a novel GR-MLP model to mitigate the dimensional collapse issue.
arXiv Detail & Related papers (2023-01-31T21:20:48Z) - What Does the Gradient Tell When Attacking the Graph Structure [44.44204591087092]
We present a theoretical demonstration revealing that attackers tend to increase inter-class edges due to the message passing mechanism of GNNs.
By connecting dissimilar nodes, attackers can more effectively corrupt node features, making such attacks more advantageous.
We propose an innovative attack loss that balances attack effectiveness and imperceptibility, sacrificing some attack effectiveness to attain greater imperceptibility.
arXiv Detail & Related papers (2022-08-26T15:45:20Z) - Revisiting Over-smoothing in BERT from the Perspective of Graph [111.24636158179908]
Recently over-smoothing phenomenon of Transformer-based models is observed in both vision and language fields.
We find that layer normalization plays a key role in the over-smoothing issue of Transformer-based models.
We consider hierarchical fusion strategies, which combine the representations from different layers adaptively to make the output more diverse.
arXiv Detail & Related papers (2022-02-17T12:20:52Z) - SkipNode: On Alleviating Performance Degradation for Deep Graph
Convolutional Networks [84.30721808557871]
We conduct theoretical and experimental analysis to explore the fundamental causes of performance degradation in deep GCNs.
We propose a simple yet effective plug-and-play module, Skipnode, to overcome the performance degradation of deep GCNs.
arXiv Detail & Related papers (2021-12-22T02:18:31Z) - Understanding Dimensional Collapse in Contrastive Self-supervised
Learning [57.98014222570084]
We show that non-contrastive methods suffer from a lesser collapse problem of a different nature: dimensional collapse.
Inspired by our theory, we propose a novel contrastive learning method, called DirectCLR, which directly optimize the representation space without relying on a trainable projector.
arXiv Detail & Related papers (2021-10-18T14:22:19Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.