Related papers: First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains

First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains

URL: http://arxiv.org/abs/2211.11719v1
Date: Mon, 21 Nov 2022 18:41:19 GMT
Title: First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains
Authors: Kefan Dong, Tengyu Ma
Abstract summary: This paper makes some initial steps towards analyzing the extrapolation of nonlinear models for structured domain shift. We prove that the family of nonlinear models of the form $f(x)=sum f_i(x_i)$, can extrapolate to unseen distributions.
Score: 35.76184529520015
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Real-world machine learning applications often involve deploying neural networks to domains that are not seen in the training time. Hence, we need to understand the extrapolation of nonlinear models -- under what conditions on the distributions and function class, models can be guaranteed to extrapolate to new test distributions. The question is very challenging because even two-layer neural networks cannot be guaranteed to extrapolate outside the support of the training distribution without further assumptions on the domain shift. This paper makes some initial steps towards analyzing the extrapolation of nonlinear models for structured domain shift. We primarily consider settings where the marginal distribution of each coordinate of the data (or subset of coordinates) do not shift significantly across the training and test distributions, but the joint distribution may have a much bigger shift. We prove that the family of nonlinear models of the form $f(x)=\sum f_i(x_i)$, where $f_i$ is an arbitrary function on the subset of features $x_i$, can extrapolate to unseen distributions, if the covariance of the features is well-conditioned. To the best of our knowledge, this is the first result that goes beyond linear models and the bounded density ratio assumption, even though the assumptions on the distribution shift and function class are stylized.

Related papers

Distributional Training Data Attribution [20.18145179467698]
We introduce distributional training data attribution (d-TDA) to predict how the distribution of model outputs depends upon the dataset.<n>We identify training examples that drastically change the distribution of some target measurement without necessarily changing the mean.<n>We also find that influence functions (IFs) emerge naturally from our distributional framework as the limit to unrolled differentiation.
arXiv Detail & Related papers (2025-06-15T21:02:36Z)
Generalization Dynamics of Linear Diffusion Models [8.107431208836426]
We analytically study the memorisation-to-generalisation transition in a simple model using linear denoisers.<n>Our work clarifies how sample complexity governs generalisation in a simple model of diffusion-based generative models.
arXiv Detail & Related papers (2025-05-30T16:31:58Z)
VAEs and GANs: Implicitly Approximating Complex Distributions with Simple Base Distributions and Deep Neural Networks -- Principles, Necessity, and Limitations [0.0]
This tutorial focuses on the fundamental architectures of Variational Autoencoders (VAE) and Generative Adversarial Networks (GAN) VAE and GAN utilize simple distributions, such as Gaussians, as a basis and leverage the powerful nonlinear transformation capabilities of neural networks to approximate arbitrarily complex distributions.
arXiv Detail & Related papers (2025-02-28T02:34:14Z)
Universality in Transfer Learning for Linear Models [18.427215139020625]
We study the problem of transfer learning in linear models for both regression and binary classification. We provide an exact and rigorous analysis and relate generalization errors (in regression) and classification errors (in binary classification) for the pretrained and fine-tuned models.
arXiv Detail & Related papers (2024-10-03T03:09:09Z)
Robust Generative Learning with Lipschitz-Regularized $α$-Divergences Allows Minimal Assumptions on Target Distributions [12.19634962193403]
This paper demonstrates the robustness of Lipschitz-regularized $alpha$-divergences as objective functionals in generative modeling. We prove the existence and finiteness of their variational derivatives, which are essential for stable training of generative models such as GANs and gradient flows. Numerical experiments confirm that generative models leveraging Lipschitz-regularized $alpha$-divergences can stably learn distributions in various challenging scenarios.
arXiv Detail & Related papers (2024-05-22T19:58:13Z)
Diffusion models for probabilistic programming [56.47577824219207]
Diffusion Model Variational Inference (DMVI) is a novel method for automated approximate inference in probabilistic programming languages (PPLs) DMVI is easy to implement, allows hassle-free inference in PPLs without the drawbacks of, e.g., variational inference using normalizing flows, and does not make any constraints on the underlying neural network model.
arXiv Detail & Related papers (2023-11-01T12:17:05Z)
Out-Of-Domain Unlabeled Data Improves Generalization [0.7589678255312519]
We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems. We show that unlabeled samples can be harnessed to narrow the generalization gap. We validate our claims through experiments conducted on a variety of synthetic and real-world datasets.
arXiv Detail & Related papers (2023-09-29T02:00:03Z)
Learning Linear Causal Representations from Interventions under General Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets. This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z)
On the detrimental effect of invariances in the likelihood for variational inference [21.912271882110986]
Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability. Prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes.
arXiv Detail & Related papers (2022-09-15T09:13:30Z)
Diffusion models as plug-and-play priors [98.16404662526101]
We consider the problem of inferring high-dimensional data $mathbfx$ in a model that consists of a prior $p(mathbfx)$ and an auxiliary constraint $c(mathbfx,mathbfy)$. The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise.
arXiv Detail & Related papers (2022-06-17T21:11:36Z)
Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features. We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z)
Why do classifier accuracies show linear trends under distribution shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution. We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone. We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.