Simplifying the Theory on Over-Smoothing
- URL: http://arxiv.org/abs/2407.11876v2
- Date: Mon, 2 Sep 2024 09:49:49 GMT
- Title: Simplifying the Theory on Over-Smoothing
- Authors: Andreas Roth,
- Abstract summary: Graph convolutions cause over-smoothing, which refers to representations becoming more similar with increased depth.
This paper attempts to align these directions by showing that over-smoothing is merely a special case of power.
- Score: 0.27195102129095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph convolutions have gained popularity due to their ability to efficiently operate on data with an irregular geometric structure. However, graph convolutions cause over-smoothing, which refers to representations becoming more similar with increased depth. However, many different definitions and intuitions currently coexist, leading to research efforts focusing on incompatible directions. This paper attempts to align these directions by showing that over-smoothing is merely a special case of power iteration. This greatly simplifies the existing theory on over-smoothing, making it more accessible. Based on the theory, we provide a novel comprehensive definition of rank collapse as a generalized form of over-smoothing and introduce the rank-one distance as a corresponding metric. Our empirical evaluation of 14 commonly used methods shows that more models than were previously known suffer from this issue.
Related papers
- Beyond spectral gap (extended): The role of the topology in
decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model.
Current theory does not explain that collaboration enables larger learning rates than training alone.
This paper aims to paint an accurate picture of sparsely-connected distributed optimization.
arXiv Detail & Related papers (2023-01-05T16:53:38Z) - CLEAR: Generative Counterfactual Explanations on Graphs [60.30009215290265]
We study the problem of counterfactual explanation generation on graphs.
A few studies have explored counterfactual explanations on graphs, but many challenges of this problem are still not well-addressed.
We propose a novel framework CLEAR which aims to generate counterfactual explanations on graphs for graph-level prediction models.
arXiv Detail & Related papers (2022-10-16T04:35:32Z) - Partial Disentanglement via Mechanism Sparsity [25.791043728989937]
Disentanglement via mechanism sparsity was introduced as a principled approach to extract latent factors without supervision.
We introduce a generalization of this theory which applies to any ground-truth graph.
We show how disentangled the learned representation is expected to be, via a new equivalence relation over models we call consistency.
arXiv Detail & Related papers (2022-07-15T20:06:12Z) - Beyond spectral gap: The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model.
This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution.
Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
arXiv Detail & Related papers (2022-06-07T08:19:06Z) - Lassoed Tree Boosting [53.56229983630983]
We prove that a gradient boosted tree algorithm with early stopping faster than $n-1/4$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation.
Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
arXiv Detail & Related papers (2022-05-22T00:34:41Z) - On the Generalization Mystery in Deep Learning [15.2292571922932]
We argue that the answer to two questions lies in the interaction of the gradients of different examples during training.
We formalize this argument with an easy to compute and interpretable metric for coherence.
The theory also explains a number of other phenomena in deep learning, such as why some examples are reliably learned earlier than others.
arXiv Detail & Related papers (2022-03-18T16:09:53Z) - Causal Expectation-Maximisation [70.45873402967297]
We show that causal inference is NP-hard even in models characterised by polytree-shaped graphs.
We introduce the causal EM algorithm to reconstruct the uncertainty about the latent variables from data about categorical manifest variables.
We argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations.
arXiv Detail & Related papers (2020-11-04T10:25:13Z) - Long-Tailed Classification by Keeping the Good and Removing the Bad
Momentum Causal Effect [95.37587481952487]
Long-tailed classification is the key to deep learning at scale.
Existing methods are mainly based on re-weighting/resamplings that lack a fundamental theory.
In this paper, we establish a causal inference framework, which not only unravels the whys of previous methods, but also derives a new principled solution.
arXiv Detail & Related papers (2020-09-28T00:32:11Z) - On Counterfactual Explanations under Predictive Multiplicity [14.37676876556672]
Counterfactual explanations are usually obtained by identifying the smallest change made to an input to change a prediction made by a fixed model.
Recent work has revitalized an old insight: there often does not exist one superior solution to a prediction problem with respect to commonly used measures of interest.
arXiv Detail & Related papers (2020-06-23T16:25:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.