Related papers: When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models

When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models

URL: http://arxiv.org/abs/2601.11444v2
Date: Wed, 21 Jan 2026 14:49:27 GMT
Title: When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models
Authors: Raphaël Razafindralambo, Rémy Sun, Frédéric Precioso, Damien Garreau, Pierre-Alexandre Mattei,
Abstract summary: We find that while ensembling the scores generally improves the score-matching loss and model likelihood, it fails to consistently enhance perceptual quality metrics such as FID on image datasets.<n>We also provide theoretical insights into the summing of score models, which shed light not only on ensembling but also on several model composition techniques.
Score: 22.019987128734282
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models now generate high-quality, diverse samples, with an increasing focus on more powerful models. Although ensembling is a well-known way to improve supervised models, its application to unconditional score-based diffusion models remains largely unexplored. In this work we investigate whether it provides tangible benefits for generative modelling. We find that while ensembling the scores generally improves the score-matching loss and model likelihood, it fails to consistently enhance perceptual quality metrics such as FID on image datasets. We confirm this observation across a breadth of aggregation rules using Deep Ensembles, Monte Carlo Dropout, on CIFAR-10 and FFHQ. We attempt to explain this discrepancy by investigating possible explanations, such as the link between score estimation and image quality. We also look into tabular data through random forests, and find that one aggregation strategy outperforms the others. Finally, we provide theoretical insights into the summing of score models, which shed light not only on ensembling but also on several model composition techniques (e.g. guidance).

Related papers

Score Augmentation for Diffusion Models [36.66701387426494]
We propose Score Augmentation (ScoreAug), a novel data augmentation framework specifically designed for diffusion models.<n>ScoreAug applies transformations to noisy data, aligning with the inherent denoising mechanism of diffusion.<n>In experiments, we extensively validate ScoreAug on multiple benchmarks including CIFAR-10, FFHQ, AFHQv2, and ImageNet.
arXiv Detail & Related papers (2025-08-11T12:39:46Z)
Zero-Shot Image Anomaly Detection Using Generative Foundation Models [2.241618130319058]
This research explores the use of score-based generative models as foundational tools for semantic anomaly detection.<n>By analyzing Stein score errors, we introduce a novel method for identifying anomalous samples without requiring re-training on each target dataset.<n>Our approach improves over state-of-the-art and relies on training a single model on one dataset -- CelebA -- which we find to be an effective base distribution.
arXiv Detail & Related papers (2025-07-30T13:56:36Z)
Multi-Level Collaboration in Model Merging [56.31088116526825]
This paper explores the intrinsic connections between model merging and model ensembling.<n>We find that even when previous restrictions are not met, there is still a way for model merging to attain a near-identical and superior performance similar to that of ensembling.
arXiv Detail & Related papers (2025-03-03T07:45:04Z)
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens [53.99177152562075]
Scaling up autoregressive models in vision has not proven as beneficial as in large language models. We focus on two critical factors: whether models use discrete or continuous tokens, and whether tokens are generated in a random or fixed order using BERT- or GPT-like transformer architectures. Our results show that while all models scale effectively in terms of validation loss, their evaluation performance -- measured by FID, GenEval score, and visual quality -- follows different trends.
arXiv Detail & Related papers (2024-10-17T17:59:59Z)
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces. We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z)
Score Mismatching for Generative Modeling [4.413162309652114]
We propose a new score-based model with one-step sampling. We train a standalone generator to compress all the time steps with the gradient backpropagated from the score network. In order to produce meaningful gradients for the generator, the score network is trained to simultaneously match the real data distribution and mismatch the fake data distribution.
arXiv Detail & Related papers (2023-09-20T03:47:12Z)
ChiroDiff: Modelling chirographic data with Diffusion Models [132.5223191478268]
We introduce a powerful model-class namely "Denoising Diffusion Probabilistic Models" or DDPMs for chirographic data. Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate.
arXiv Detail & Related papers (2023-04-07T15:17:48Z)
Score-Based Generative Classifiers [9.063815952852783]
Generative models have been used as adversarially robust classifiers on simple datasets such as MNIST. Previous results have suggested a trade-off between the likelihood of the data and classification accuracy. We show that score-based generative models are closing the gap in classification accuracy compared to standard discriminative models.
arXiv Detail & Related papers (2021-10-01T15:05:33Z)
Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective. Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z)
Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously. We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework. The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.