First Steps Toward Understanding the Extrapolation of Nonlinear Models
to Unseen Domains
- URL: http://arxiv.org/abs/2211.11719v1
- Date: Mon, 21 Nov 2022 18:41:19 GMT
- Title: First Steps Toward Understanding the Extrapolation of Nonlinear Models
to Unseen Domains
- Authors: Kefan Dong, Tengyu Ma
- Abstract summary: This paper makes some initial steps towards analyzing the extrapolation of nonlinear models for structured domain shift.
We prove that the family of nonlinear models of the form $f(x)=sum f_i(x_i)$, can extrapolate to unseen distributions.
- Score: 35.76184529520015
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Real-world machine learning applications often involve deploying neural
networks to domains that are not seen in the training time. Hence, we need to
understand the extrapolation of nonlinear models -- under what conditions on
the distributions and function class, models can be guaranteed to extrapolate
to new test distributions. The question is very challenging because even
two-layer neural networks cannot be guaranteed to extrapolate outside the
support of the training distribution without further assumptions on the domain
shift. This paper makes some initial steps towards analyzing the extrapolation
of nonlinear models for structured domain shift. We primarily consider settings
where the marginal distribution of each coordinate of the data (or subset of
coordinates) do not shift significantly across the training and test
distributions, but the joint distribution may have a much bigger shift. We
prove that the family of nonlinear models of the form $f(x)=\sum f_i(x_i)$,
where $f_i$ is an arbitrary function on the subset of features $x_i$, can
extrapolate to unseen distributions, if the covariance of the features is
well-conditioned. To the best of our knowledge, this is the first result that
goes beyond linear models and the bounded density ratio assumption, even though
the assumptions on the distribution shift and function class are stylized.
Related papers
- Universality in Transfer Learning for Linear Models [18.427215139020625]
We study the problem of transfer learning in linear models for both regression and binary classification.
We provide an exact and rigorous analysis and relate generalization errors (in regression) and classification errors (in binary classification) for the pretrained and fine-tuned models.
arXiv Detail & Related papers (2024-10-03T03:09:09Z) - Robust Generative Learning with Lipschitz-Regularized $α$-Divergences Allows Minimal Assumptions on Target Distributions [12.19634962193403]
This paper demonstrates the robustness of Lipschitz-regularized $alpha$-divergences as objective functionals in generative modeling.
We prove the existence and finiteness of their variational derivatives, which are essential for stable training of generative models such as GANs and gradient flows.
Numerical experiments confirm that generative models leveraging Lipschitz-regularized $alpha$-divergences can stably learn distributions in various challenging scenarios.
arXiv Detail & Related papers (2024-05-22T19:58:13Z) - Diffusion models for probabilistic programming [56.47577824219207]
Diffusion Model Variational Inference (DMVI) is a novel method for automated approximate inference in probabilistic programming languages (PPLs)
DMVI is easy to implement, allows hassle-free inference in PPLs without the drawbacks of, e.g., variational inference using normalizing flows, and does not make any constraints on the underlying neural network model.
arXiv Detail & Related papers (2023-11-01T12:17:05Z) - Out-Of-Domain Unlabeled Data Improves Generalization [0.7589678255312519]
We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems.
We show that unlabeled samples can be harnessed to narrow the generalization gap.
We validate our claims through experiments conducted on a variety of synthetic and real-world datasets.
arXiv Detail & Related papers (2023-09-29T02:00:03Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - On the detrimental effect of invariances in the likelihood for
variational inference [21.912271882110986]
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability.
Prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes.
arXiv Detail & Related papers (2022-09-15T09:13:30Z) - Diffusion models as plug-and-play priors [98.16404662526101]
We consider the problem of inferring high-dimensional data $mathbfx$ in a model that consists of a prior $p(mathbfx)$ and an auxiliary constraint $c(mathbfx,mathbfy)$.
The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise.
arXiv Detail & Related papers (2022-06-17T21:11:36Z) - Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features.
We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z) - Why do classifier accuracies show linear trends under distribution
shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution.
We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone.
We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.