First Steps Toward Understanding the Extrapolation of Nonlinear Models
to Unseen Domains
- URL: http://arxiv.org/abs/2211.11719v1
- Date: Mon, 21 Nov 2022 18:41:19 GMT
- Title: First Steps Toward Understanding the Extrapolation of Nonlinear Models
to Unseen Domains
- Authors: Kefan Dong, Tengyu Ma
- Abstract summary: This paper makes some initial steps towards analyzing the extrapolation of nonlinear models for structured domain shift.
We prove that the family of nonlinear models of the form $f(x)=sum f_i(x_i)$, can extrapolate to unseen distributions.
- Score: 35.76184529520015
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Real-world machine learning applications often involve deploying neural
networks to domains that are not seen in the training time. Hence, we need to
understand the extrapolation of nonlinear models -- under what conditions on
the distributions and function class, models can be guaranteed to extrapolate
to new test distributions. The question is very challenging because even
two-layer neural networks cannot be guaranteed to extrapolate outside the
support of the training distribution without further assumptions on the domain
shift. This paper makes some initial steps towards analyzing the extrapolation
of nonlinear models for structured domain shift. We primarily consider settings
where the marginal distribution of each coordinate of the data (or subset of
coordinates) do not shift significantly across the training and test
distributions, but the joint distribution may have a much bigger shift. We
prove that the family of nonlinear models of the form $f(x)=\sum f_i(x_i)$,
where $f_i$ is an arbitrary function on the subset of features $x_i$, can
extrapolate to unseen distributions, if the covariance of the features is
well-conditioned. To the best of our knowledge, this is the first result that
goes beyond linear models and the bounded density ratio assumption, even though
the assumptions on the distribution shift and function class are stylized.
Related papers
- Diffusion models for probabilistic programming [56.47577824219207]
Diffusion Model Variational Inference (DMVI) is a novel method for automated approximate inference in probabilistic programming languages (PPLs)
DMVI is easy to implement, allows hassle-free inference in PPLs without the drawbacks of, e.g., variational inference using normalizing flows, and does not make any constraints on the underlying neural network model.
arXiv Detail & Related papers (2023-11-01T12:17:05Z) - Out-Of-Domain Unlabeled Data Improves Generalization [0.7589678255312519]
We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems.
We show that unlabeled samples can be harnessed to narrow the generalization gap.
We validate our claims through experiments conducted on a variety of synthetic and real-world datasets.
arXiv Detail & Related papers (2023-09-29T02:00:03Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - On the detrimental effect of invariances in the likelihood for
variational inference [21.912271882110986]
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability.
Prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes.
arXiv Detail & Related papers (2022-09-15T09:13:30Z) - Diffusion models as plug-and-play priors [98.16404662526101]
We consider the problem of inferring high-dimensional data $mathbfx$ in a model that consists of a prior $p(mathbfx)$ and an auxiliary constraint $c(mathbfx,mathbfy)$.
The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise.
arXiv Detail & Related papers (2022-06-17T21:11:36Z) - Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features.
We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z) - Domain Conditional Predictors for Domain Adaptation [3.951376400628575]
We consider a conditional modeling approach in which predictions, in addition to being dependent on the input data, use information relative to the underlying data-generating distribution.
We argue that such an approach is more generally applicable than current domain adaptation methods.
arXiv Detail & Related papers (2021-06-25T22:15:54Z) - Neural Networks for Learning Counterfactual G-Invariances from Single
Environments [13.848760376470038]
neural networks are believed to have difficulties extrapolating beyond training data distribution.
This work shows that, for extrapolations based on finite transformation groups, a model's inability to extrapolate is unrelated to its capacity.
arXiv Detail & Related papers (2021-04-20T16:35:35Z) - Why do classifier accuracies show linear trends under distribution
shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution.
We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone.
We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.