Principled Out-of-Distribution Generalization via Simplicity
- URL: http://arxiv.org/abs/2505.22622v1
- Date: Wed, 28 May 2025 17:44:10 GMT
- Title: Principled Out-of-Distribution Generalization via Simplicity
- Authors: Jiawei Ge, Amanda Wang, Shange Tang, Chi Jin,
- Abstract summary: We study the compositional generalization abilities of diffusion models in image generation.<n>We develop a theoretical framework for OOD generalization via simplicity, quantified using a predefined simplicity metric.<n>We establish the first sharp sample complexity guarantees for learning the true, generalizable, simple model.
- Score: 16.17883058788714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern foundation models exhibit remarkable out-of-distribution (OOD) generalization, solving tasks far beyond the support of their training data. However, the theoretical principles underpinning this phenomenon remain elusive. This paper investigates this problem by examining the compositional generalization abilities of diffusion models in image generation. Our analysis reveals that while neural network architectures are expressive enough to represent a wide range of models -- including many with undesirable behavior on OOD inputs -- the true, generalizable model that aligns with human expectations typically corresponds to the simplest among those consistent with the training data. Motivated by this observation, we develop a theoretical framework for OOD generalization via simplicity, quantified using a predefined simplicity metric. We analyze two key regimes: (1) the constant-gap setting, where the true model is strictly simpler than all spurious alternatives by a fixed gap, and (2) the vanishing-gap setting, where the fixed gap is replaced by a smoothness condition ensuring that models close in simplicity to the true model yield similar predictions. For both regimes, we study the regularized maximum likelihood estimator and establish the first sharp sample complexity guarantees for learning the true, generalizable, simple model.
Related papers
- Physics-Informed Diffusion Models [0.0]
We present a framework that unifies generative modeling and partial differential equation fulfillment.<n>Our approach reduces the residual error by up to two orders of magnitude compared to previous work in a fluid flow case study.
arXiv Detail & Related papers (2024-03-21T13:52:55Z) - Interpretability Illusions in the Generalization of Simplified Models [30.124082589662574]
A common method to study deep learning systems is to use simplified model representations.
This approach assumes that the results of these simplifications are faithful to the original model.
We show that even if the simplified representations can accurately approximate the full model on the training set, they may fail to accurately capture the model's behavior out of distribution.
arXiv Detail & Related papers (2023-12-06T18:25:53Z) - Mitigating Simplicity Bias in Deep Learning for Improved OOD
Generalization and Robustness [5.976013616522926]
We propose a framework that encourages the model to use a more diverse set of features to make predictions.
We first train a simple model, and then regularize the conditional mutual information with respect to it to obtain the final model.
We demonstrate the effectiveness of this framework in various problem settings and real-world applications.
arXiv Detail & Related papers (2023-10-09T21:19:39Z) - A Mathematical Framework for Learning Probability Distributions [0.0]
generative modeling and density estimation has become an immensely popular subject in recent years.
This paper provides a mathematical framework such that all the well-known models can be derived based on simple principles.
In particular, we prove that these models enjoy implicit regularization during training, so that the generalization error at early-stopping avoids the curse of dimensionality.
arXiv Detail & Related papers (2022-12-22T04:41:45Z) - SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in
Fine-tuned Source Code Models [58.78043959556283]
We study the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods.
Our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.
arXiv Detail & Related papers (2022-10-10T16:07:24Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - A Model of One-Shot Generalization [6.155604731137828]
One-shot generalization refers to the ability of an algorithm to perform transfer learning within a single task.
We show that the most direct neural network architecture for our data model performs one-shot generalization almost perfectly.
arXiv Detail & Related papers (2022-05-29T01:41:29Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Evading the Simplicity Bias: Training a Diverse Set of Models Discovers
Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features.
This simplicity bias can explain their lack of robustness out of distribution (OOD)
We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.