An Online Learning Approach to Interpolation and Extrapolation in Domain
Generalization
- URL: http://arxiv.org/abs/2102.13128v1
- Date: Thu, 25 Feb 2021 19:06:48 GMT
- Title: An Online Learning Approach to Interpolation and Extrapolation in Domain
Generalization
- Authors: Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski
- Abstract summary: We recast generalization over sub-groups as an online game between a player minimizing risk and an adversary presenting new test.
We show that ERM is provably minimax-optimal for both tasks.
- Score: 53.592597682854944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A popular assumption for out-of-distribution generalization is that the
training data comprises sub-datasets, each drawn from a distinct distribution;
the goal is then to "interpolate" these distributions and "extrapolate" beyond
them -- this objective is broadly known as domain generalization. A common
belief is that ERM can interpolate but not extrapolate and that the latter is
considerably more difficult, but these claims are vague and lack formal
justification. In this work, we recast generalization over sub-groups as an
online game between a player minimizing risk and an adversary presenting new
test distributions. Under an existing notion of inter- and extrapolation based
on reweighting of sub-group likelihoods, we rigorously demonstrate that
extrapolation is computationally much harder than interpolation, though their
statistical complexity is not significantly different. Furthermore, we show
that ERM -- or a noisy variant -- is provably minimax-optimal for both tasks.
Our framework presents a new avenue for the formal analysis of domain
generalization algorithms which may be of independent interest.
Related papers
- Generalization error of min-norm interpolators in transfer learning [2.7309692684728617]
Min-norm interpolators emerge naturally as implicit regularized limits of modern machine learning algorithms.
In many applications, a limited amount of test data may be available during training, yet properties of min-norm in this setting are not well-understood.
We establish a novel anisotropic local law to achieve these characterizations.
arXiv Detail & Related papers (2024-06-20T02:23:28Z) - How Does Distribution Matching Help Domain Generalization: An Information-theoretic Analysis [21.685468628033206]
We formulate domain generalization from a novel probabilistic perspective.
We provide key insights into the roles of gradient and representation matching in promoting generalization.
In light of these theoretical findings, we introduce IDM to simultaneously align the inter-domain gradients and representations.
arXiv Detail & Related papers (2024-06-14T06:28:17Z) - Learning Stochastic Majority Votes by Minimizing a PAC-Bayes
Generalization Bound [15.557653926558638]
We investigate a counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties.
We instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk.
The resulting majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight bounds.
arXiv Detail & Related papers (2021-06-23T16:57:23Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - KL Guided Domain Adaptation [88.19298405363452]
Domain adaptation is an important problem and often needed for real-world applications.
A common approach in the domain adaptation literature is to learn a representation of the input that has the same distributions over the source and the target domain.
We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples.
arXiv Detail & Related papers (2021-06-14T22:24:23Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable
Neural Distribution Alignment [52.02794488304448]
We propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows.
We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
arXiv Detail & Related papers (2020-03-26T22:10:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.