Measuring Generalization with Optimal Transport
- URL: http://arxiv.org/abs/2106.03314v1
- Date: Mon, 7 Jun 2021 03:04:59 GMT
- Title: Measuring Generalization with Optimal Transport
- Authors: Ching-Yao Chuang, Youssef Mroueh, Kristjan Greenewald, Antonio
Torralba, Stefanie Jegelka
- Abstract summary: We develop margin-based generalization bounds, where the margins are normalized with optimal transport costs.
Our bounds robustly predict the generalization error, given training data and network parameters, on large scale datasets.
- Score: 111.29415509046886
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the generalization of deep neural networks is one of the most
important tasks in deep learning. Although much progress has been made,
theoretical error bounds still often behave disparately from empirical
observations. In this work, we develop margin-based generalization bounds,
where the margins are normalized with optimal transport costs between
independent random subsets sampled from the training distribution. In
particular, the optimal transport cost can be interpreted as a generalization
of variance which captures the structural properties of the learned feature
space. Our bounds robustly predict the generalization error, given training
data and network parameters, on large scale datasets. Theoretically, we
demonstrate that the concentration and separation of features play crucial
roles in generalization, supporting empirical results in the literature. The
code is available at \url{https://github.com/chingyaoc/kV-Margin}.
Related papers
- PAC-Bayes Compression Bounds So Tight That They Can Explain
Generalization [48.26492774959634]
We develop a compression approach based on quantizing neural network parameters in a linear subspace.
We find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor.
arXiv Detail & Related papers (2022-11-24T13:50:16Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Adversarially Robust Models may not Transfer Better: Sufficient
Conditions for Domain Transferability from the View of Regularization [17.825841580342715]
Machine learning robustness and domain generalization are fundamentally correlated.
Recent studies show that more robust (adversarially trained) models are more generalizable.
There is a lack of theoretical understanding of their fundamental connections.
arXiv Detail & Related papers (2022-02-03T20:26:27Z) - Distribution of Classification Margins: Are All Data Equal? [61.16681488656473]
We motivate theoretically and show empirically that the area under the curve of the margin distribution on the training set is in fact a good measure of generalization.
The resulting subset of "high capacity" features is not consistent across different training runs.
arXiv Detail & Related papers (2021-07-21T16:41:57Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - Extrapolatable Relational Reasoning With Comparators in Low-Dimensional
Manifolds [7.769102711230249]
We propose a neuroscience-inspired inductive-biased module that can be readily amalgamated with current neural network architectures.
We show that neural nets with this inductive bias achieve considerably better o.o.d generalisation performance for a range of relational reasoning tasks.
arXiv Detail & Related papers (2020-06-15T19:09:13Z) - Topologically Densified Distributions [25.140319008330167]
We study regularization in the context of small sample-size learning with over- parameterized neural networks.
We impose a topological constraint on samples drawn from the probability measure induced in that space.
This provably leads to mass concentration effects around the representations of training instances.
arXiv Detail & Related papers (2020-02-12T05:25:15Z) - Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective.
We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.