Biased Generalization in Diffusion Models
- URL: http://arxiv.org/abs/2603.03469v1
- Date: Tue, 03 Mar 2026 19:25:33 GMT
- Title: Biased Generalization in Diffusion Models
- Authors: Jerome Garnier-Brun, Luca Biggio, Davide Beltrame, Marc Mézard, Luca Saglietti,
- Abstract summary: Generalization in generative modeling is defined as the ability to learn an underlying distribution from a finite dataset and produce novel samples.<n>In practice, training is often stopped at the minimum of the test loss, taken as an operational indicator of generalization.<n>We challenge this viewpoint by identifying a phase of biased generalization during training, in which the model continues to decrease the test loss while favoring samples with anomalously high proximity to training data.
- Score: 4.602851365305176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization in generative modeling is defined as the ability to learn an underlying distribution from a finite dataset and produce novel samples, with evaluation largely driven by held-out performance and perceived sample quality. In practice, training is often stopped at the minimum of the test loss, taken as an operational indicator of generalization. We challenge this viewpoint by identifying a phase of biased generalization during training, in which the model continues to decrease the test loss while favoring samples with anomalously high proximity to training data. By training the same network on two disjoint datasets and comparing the mutual distances of generated samples and their similarity to training data, we introduce a quantitative measure of bias and demonstrate its presence on real images. We then study the mechanism of bias, using a controlled hierarchical data model where access to exact scores and ground-truth statistics allows us to precisely characterize its onset. We attribute this phenomenon to the sequential nature of feature learning in deep networks, where coarse structure is learned early in a data-independent manner, while finer features are resolved later in a way that increasingly depends on individual training samples. Our results show that early stopping at the test loss minimum, while optimal under standard generalization criteria, may be insufficient for privacy-critical applications.
Related papers
- Correcting False Alarms from Unseen: Adapting Graph Anomaly Detectors at Test Time [60.341117019125214]
We propose a lightweight and plug-and-play Test-time adaptation framework for correcting Unseen Normal pattErns in graph anomaly detection (GAD)<n>To address semantic confusion, a graph aligner is employed to align the shifted data to the original one at the graph attribute level.<n>Extensive experiments on 10 real-world datasets demonstrate that TUNE significantly enhances the generalizability of pre-trained GAD models to both synthetic and real unseen normal patterns.
arXiv Detail & Related papers (2025-11-10T12:10:05Z) - Leveraging Learning Bias for Noisy Anomaly Detection [19.23861148116995]
This paper addresses the challenge of fully unsupervised image anomaly detection (FUIAD)<n> Conventional methods assume anomaly-free training data, but real-world contamination leads models to absorb anomalies as normal.<n>We propose a two-stage framework that exploits inherent learning bias in models.
arXiv Detail & Related papers (2025-08-10T17:47:21Z) - Semi-supervised Learning For Robust Speech Evaluation [30.593420641501968]
Speech evaluation measures a learners oral proficiency using automatic models.
This paper proposes to address such challenges by exploiting semi-supervised pre-training and objective regularization.
An anchor model is trained using pseudo labels to predict the correctness of pronunciation.
arXiv Detail & Related papers (2024-09-23T02:11:24Z) - Two Is Better Than One: Aligned Representation Pairs for Anomaly Detection [56.57122939745213]
Anomaly detection focuses on identifying samples that deviate from the norm.<n>Recent self-supervised methods have successfully learned such representations by employing prior knowledge about anomalies to create synthetic outliers during training.<n>We address this limitation with our new approach Con$$, which leverages prior knowledge about symmetries in normal samples to observe the data in different contexts.
arXiv Detail & Related papers (2024-05-29T07:59:06Z) - Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - Mixture Data for Training Cannot Ensure Out-of-distribution Generalization [21.801115344132114]
We show that increasing the size of training data does not always lead to a reduction in the test generalization error.
In this work, we quantitatively redefine OOD data as those situated outside the convex hull of mixed training data.
Our proof of the new risk bound agrees that the efficacy of well-trained models can be guaranteed for unseen data.
arXiv Detail & Related papers (2023-12-25T11:00:38Z) - A Generic Machine Learning Framework for Fully-Unsupervised Anomaly
Detection with Contaminated Data [0.0]
We introduce a framework for a fully unsupervised refinement of contaminated training data for AD tasks.
The framework is generic and can be applied to any residual-based machine learning model.
We show its clear superiority over the naive approach of training with contaminated data without refinement.
arXiv Detail & Related papers (2023-08-25T12:47:59Z) - Semantic Self-adaptation: Enhancing Generalization with a Single Sample [45.111358665370524]
We propose a self-adaptive approach for semantic segmentation.
It fine-tunes the parameters of convolutional layers to the input image using consistency regularization.
Our empirical study suggests that self-adaptation may complement the established practice of model regularization at training time.
arXiv Detail & Related papers (2022-08-10T12:29:01Z) - Dense Out-of-Distribution Detection by Robust Learning on Synthetic
Negative Data [1.7474352892977458]
We show how to detect out-of-distribution anomalies in road-driving scenes and remote sensing imagery.
We leverage a jointly trained normalizing flow due to coverage-oriented learning objective and the capability to generate samples at different resolutions.
The resulting models set the new state of the art on benchmarks for out-of-distribution detection in road-driving scenes and remote sensing imagery.
arXiv Detail & Related papers (2021-12-23T20:35:10Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Bayesian analysis of the prevalence bias: learning and predicting from
imbalanced data [10.659348599372944]
This paper lays the theoretical and computational framework for training models, and for prediction, in the presence of prevalence bias.
It offers an alternative to principled training losses and complements test-time procedures based on selecting an operating point from summary curves.
It integrates seamlessly in the current paradigm of (deep) learning using backpropagation and naturally with Bayesian models.
arXiv Detail & Related papers (2021-07-31T14:36:33Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.