Related papers: Position: Don't be Afraid of Over-Smoothing And Over-Squashing

Position: Don't be Afraid of Over-Smoothing And Over-Squashing

URL: http://arxiv.org/abs/2601.07419v1
Date: Mon, 12 Jan 2026 11:02:57 GMT
Title: Position: Don't be Afraid of Over-Smoothing And Over-Squashing
Authors: Niklas Kormann, Benjamin Doerr, Johannes F. Lutzeyer,
Abstract summary: We argue that performance decreases often stem from uninformative receptive fields rather than over-smoothing.<n>We show that architectural interventions designed to mitigate over-squashing fail to yield significant performance gains.
Score: 22.895536023786974
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Over-smoothing and over-squashing have been extensively studied in the literature on Graph Neural Networks (GNNs) over the past years. We challenge this prevailing focus in GNN research, arguing that these phenomena are less critical for practical applications than assumed. We suggest that performance decreases often stem from uninformative receptive fields rather than over-smoothing. We support this position with extensive experiments on several standard benchmark datasets, demonstrating that accuracy and over-smoothing are mostly uncorrelated and that optimal model depths remain small even with mitigation techniques, thus highlighting the negligible role of over-smoothing. Similarly, we challenge that over-squashing is always detrimental in practical applications. Instead, we posit that the distribution of relevant information over the graph frequently factorises and is often localised within a small k-hop neighbourhood, questioning the necessity of jointly observing entire receptive fields or engaging in an extensive search for long-range interactions. The results of our experiments show that architectural interventions designed to mitigate over-squashing fail to yield significant performance gains. This position paper advocates for a paradigm shift in theoretical research, urging a diligent analysis of learning tasks and datasets using statistics that measure the underlying distribution of label-relevant information to better understand their localisation and factorisation.

Related papers

The Robustness of Differentiable Causal Discovery in Misspecified Scenarios [18.797446049830636]
Causal discovery aims to learn causal relationships between variables from targeted data.<n>We show that differentiable causal discovery methods exhibit robustness under the metrics of Structural Hamming Distance and Structural Intervention Distance.
arXiv Detail & Related papers (2025-10-14T13:33:06Z)
Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks [37.020118015110086]
Machine learning on graphs has demonstrated promise in drug design and molecular property prediction.<n>This position paper calls for a paradigm shift toward more meaningful benchmarks, rigorous evaluation protocols, and stronger collaboration with domain experts.
arXiv Detail & Related papers (2025-02-20T13:21:47Z)
Deriving Causal Order from Single-Variable Interventions: Guarantees & Algorithm [14.980926991441345]
We show that datasets containing interventional data can be effectively extracted under realistic assumptions about the data distribution.<n>We introduce a novel variant of interventional faithfulness, which relies on comparisons between the marginal distributions of each variable across observational and interventional settings.<n>We also introduce Intersort, an algorithm designed to infer the causal order from datasets containing large numbers of single-variable interventions.
arXiv Detail & Related papers (2024-05-28T16:07:17Z)
A Survey of Deep Long-Tail Classification Advancements [1.6233132273470656]
Many data distributions in the real world are hardly uniform. Instead, skewed and long-tailed distributions of various kinds are commonly observed. This poses an interesting problem for machine learning, where most algorithms assume or work well with uniformly distributed data. The problem is further exacerbated by current state-of-the-art deep learning models requiring large volumes of training data.
arXiv Detail & Related papers (2024-04-24T01:59:02Z)
Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks. The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data. Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z)
Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning [98.78136504619539]
Causal Triplet is a causal representation learning benchmark featuring visually more complex scenes. We show that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts.
arXiv Detail & Related papers (2023-01-12T17:43:38Z)
Deconfounded Training for Graph Neural Networks [98.06386851685645]
We present a new paradigm of decon training (DTP) that better mitigates the confounding effect and latches on the critical information. Specifically, we adopt the attention modules to disentangle the critical subgraph and trivial subgraph. It allows GNNs to capture a more reliable subgraph whose relation with the label is robust across different distributions.
arXiv Detail & Related papers (2021-12-30T15:22:35Z)
Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious. We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z)
Exploring the Limits of Few-Shot Link Prediction in Knowledge Graphs [49.6661602019124]
We study a spectrum of models derived by generalizing the current state of the art for few-shot link prediction. We find that a simple zero-shot baseline - which ignores any relation-specific information - achieves surprisingly strong performance. Experiments on carefully crafted synthetic datasets show that having only a few examples of a relation fundamentally limits models from using fine-grained structural information.
arXiv Detail & Related papers (2021-02-05T21:04:31Z)
Fundamental Limits and Tradeoffs in Invariant Representation Learning [99.2368462915979]
Many machine learning applications involve learning representations that achieve two competing goals. Minimax game-theoretic formulation represents a fundamental tradeoff between accuracy and invariance. We provide an information-theoretic analysis of this general and important problem under both classification and regression settings.
arXiv Detail & Related papers (2020-12-19T15:24:04Z)
Accurate and Robust Feature Importance Estimation under Distribution Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method. We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
Selecting Data Augmentation for Simulating Interventions [12.848239550098693]
Machine learning models trained with purely observational data and the principle of empirical risk fail to generalize to unseen domains. We argue that causal concepts can be used to explain the success of data augmentation by describing how they can weaken the spurious correlation between the observed domains and the task labels.
arXiv Detail & Related papers (2020-05-04T21:33:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.