Related papers: GUDA: Counterfactual Group-wise Training Data Attribution for Diffusion Models via Unlearning

GUDA: Counterfactual Group-wise Training Data Attribution for Diffusion Models via Unlearning

URL: http://arxiv.org/abs/2601.22651v1
Date: Fri, 30 Jan 2026 07:10:59 GMT
Title: GUDA: Counterfactual Group-wise Training Data Attribution for Diffusion Models via Unlearning
Authors: Naoki Murata, Yuhta Takida, Chieh-Hsin Lai, Toshimitsu Uesaka, Bac Nguyen, Stefano Ermon, Yuki Mitsufuji,
Abstract summary: Group-wise attribution is counterfactual: how would a model's behavior on a generated sample change if a group were absent from training?<n>We propose GUDA (Group Unlearning-based Data Attribution) for diffusion models, which approximates each counterfactual model by applying machine unlearning to a shared full-data model instead of training from scratch.
Score: 83.56510119503267
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training-data attribution for vision generative models aims to identify which training data influenced a given output. While most methods score individual examples, practitioners often need group-level answers (e.g., artistic styles or object classes). Group-wise attribution is counterfactual: how would a model's behavior on a generated sample change if a group were absent from training? A natural realization of this counterfactual is Leave-One-Group-Out (LOGO) retraining, which retrains the model with each group removed; however, it becomes computationally prohibitive as the number of groups grows. We propose GUDA (Group Unlearning-based Data Attribution) for diffusion models, which approximates each counterfactual model by applying machine unlearning to a shared full-data model instead of training from scratch. GUDA quantifies group influence using differences in a likelihood-based scoring rule (ELBO) between the full model and each unlearned counterfactual. Experiments on CIFAR-10 and artistic style attribution with Stable Diffusion show that GUDA identifies primary contributing groups more reliably than semantic similarity, gradient-based attribution, and instance-level unlearning approaches, while achieving x100 speedup on CIFAR-10 over LOGO retraining.

Related papers

Harnessing Diffusion-Generated Synthetic Images for Fair Image Classification [25.474389970409067]
Image classification systems often inherit biases from uneven group representation in training data.<n>In this work, we explore multiple diffusion-finetuning techniques, e.g., LoRA and DreamBooth, to generate images that more accurately represent each training group.
arXiv Detail & Related papers (2025-11-11T19:20:13Z)
Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z)
Distributional Training Data Attribution: What do Influence Functions Sample? [25.257922996567178]
We introduce distributional training data attribution (d-TDA)<n>The goal of d-TDA is to predict how the distribution of model outputs depends upon the dataset.<n>We find that influence functions (IFs) are'secretly distributional'
arXiv Detail & Related papers (2025-06-15T21:02:36Z)
Invariance Pair-Guided Learning: Enhancing Robustness in Neural Networks [0.0]
We propose a technique to guide the neural network through the training phase.<n>We form a corrective gradient complementing the traditional gradient descent approach.<n>Experiments on ColoredMNIST, Waterbird-100, and CelebANIST datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2025-02-26T09:36:00Z)
Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection [80.85902083005237]
We introduce Data Debiasing with Datamodels (D3M), a debiasing approach which isolates and removes specific training examples that drive the model's failures on minority groups.
arXiv Detail & Related papers (2024-06-24T17:51:01Z)
Ablation Based Counterfactuals [7.481286710933861]
Ablation Based Counterfactuals (ABC) is a method of performing counterfactual analysis that relies on model ablation rather than model retraining. We demonstrate how we can construct a model like this using an ensemble of diffusion models. We then use this model to study the limits of training data attribution by enumerating full counterfactual landscapes.
arXiv Detail & Related papers (2024-06-12T06:22:51Z)
Mutual Exclusive Modulator for Long-Tailed Recognition [12.706961256329572]
Long-tailed recognition is the task of learning high-performance classifiers given extremely imbalanced training samples between categories. We introduce a mutual exclusive modulator which can estimate the probability of an image belonging to each group. Our method achieves competitive performance compared to the state-of-the-art benchmarks.
arXiv Detail & Related papers (2023-02-19T07:31:49Z)
Fair Group-Shared Representations with Normalizing Flows [68.29997072804537]
We develop a fair representation learning algorithm which is able to map individuals belonging to different groups in a single group. We show experimentally that our methodology is competitive with other fair representation learning algorithms.
arXiv Detail & Related papers (2022-01-17T10:49:49Z)
Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics. We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data. Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z)
No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data. We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model. Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.