Topologically Densified Distributions
- URL: http://arxiv.org/abs/2002.04805v2
- Date: Mon, 17 May 2021 04:15:03 GMT
- Title: Topologically Densified Distributions
- Authors: Christoph D. Hofer, Florian Graf, Marc Niethammer, Roland Kwitt
- Abstract summary: We study regularization in the context of small sample-size learning with over- parameterized neural networks.
We impose a topological constraint on samples drawn from the probability measure induced in that space.
This provably leads to mass concentration effects around the representations of training instances.
- Score: 25.140319008330167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study regularization in the context of small sample-size learning with
over-parameterized neural networks. Specifically, we shift focus from
architectural properties, such as norms on the network weights, to properties
of the internal representations before a linear classifier. Specifically, we
impose a topological constraint on samples drawn from the probability measure
induced in that space. This provably leads to mass concentration effects around
the representations of training instances, i.e., a property beneficial for
generalization. By leveraging previous work to impose topological constraints
in a neural network setting, we provide empirical evidence (across various
vision benchmarks) to support our claim for better generalization.
Related papers
- TANGOS: Regularizing Tabular Neural Networks through Gradient
Orthogonalization and Specialization [69.80141512683254]
We introduce Tabular Neural Gradient Orthogonalization and gradient (TANGOS)
TANGOS is a novel framework for regularization in the tabular setting built on latent unit attributions.
We demonstrate that our approach can lead to improved out-of-sample generalization performance, outperforming other popular regularization methods.
arXiv Detail & Related papers (2023-03-09T18:57:13Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Generalization Guarantee of Training Graph Convolutional Networks with
Graph Topology Sampling [83.77955213766896]
Graph convolutional networks (GCNs) have recently achieved great empirical success in learning graphstructured data.
To address its scalability issue, graph topology sampling has been proposed to reduce the memory and computational cost of training Gs.
This paper provides first theoretical justification of graph topology sampling in training (up to) three-layer GCNs.
arXiv Detail & Related papers (2022-07-07T21:25:55Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Topological Regularization for Dense Prediction [5.71097144710995]
We develop a form of topological regularization based on persistent homology that can be used in dense prediction tasks with topological descriptions.
We demonstrate that this topological regularization of internal activations leads to improved convergence and test benchmarks on several problems and architectures.
arXiv Detail & Related papers (2021-11-22T04:44:45Z) - Measuring Generalization with Optimal Transport [111.29415509046886]
We develop margin-based generalization bounds, where the margins are normalized with optimal transport costs.
Our bounds robustly predict the generalization error, given training data and network parameters, on large scale datasets.
arXiv Detail & Related papers (2021-06-07T03:04:59Z) - Uniform Convergence, Adversarial Spheres and a Simple Remedy [40.44709296304123]
Previous work has cast doubt on the general framework of uniform convergence and its ability to explain generalization in neural networks.
We provide an extensive theoretical investigation of the previously studied data setting through the lens of infinitely-wide models.
We prove that the Neural Tangent Kernel (NTK) also suffers from the same phenomenon and we uncover its origin.
arXiv Detail & Related papers (2021-05-07T20:23:01Z) - Intraclass clustering: an implicit learning ability that regularizes
DNNs [22.732204569029648]
We show that deep neural networks are regularized through their ability to extract meaningful clusters among a class.
Measures of intraclass clustering are designed based on the neuron- and layer-level representations of the training data.
arXiv Detail & Related papers (2021-03-11T15:26:27Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.