Related papers: Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias

Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias

URL: http://arxiv.org/abs/2602.21773v1
Date: Wed, 25 Feb 2026 10:48:51 GMT
Title: Easy to Learn, Yet Hard to Forget: Towards Robust Unlearning Under Bias
Authors: JuneHyoung Kwon, MiHyeon Kim, Eunju Lee, Yoonji Lee, Seunghoon Lee, YoungBin Kim,
Abstract summary: CUPID is a new unlearning framework inspired by the observation that samples with different biases exhibit distinct loss landscape sharpness.<n>Our method first partitions the forget set into causal- and bias-approximated subsets based on sample sharpness, then disentangles model parameters into causal and bias pathways.
Score: 18.802863823537542
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine unlearning, which enables a model to forget specific data, is crucial for ensuring data privacy and model reliability. However, its effectiveness can be severely undermined in real-world scenarios where models learn unintended biases from spurious correlations within the data. This paper investigates the unique challenges of unlearning from such biased models. We identify a novel phenomenon we term ``shortcut unlearning," where models exhibit an ``easy to learn, yet hard to forget" tendency. Specifically, models struggle to forget easily-learned, bias-aligned samples; instead of forgetting the class attribute, they unlearn the bias attribute, which can paradoxically improve accuracy on the class intended to be forgotten. To address this, we propose CUPID, a new unlearning framework inspired by the observation that samples with different biases exhibit distinct loss landscape sharpness. Our method first partitions the forget set into causal- and bias-approximated subsets based on sample sharpness, then disentangles model parameters into causal and bias pathways, and finally performs a targeted update by routing refined causal and bias gradients to their respective pathways. Extensive experiments on biased datasets including Waterbirds, BAR, and Biased NICO++ demonstrate that our method achieves state-of-the-art forgetting performance and effectively mitigates the shortcut unlearning problem.

Related papers

Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing [18.864168501187393]
We introduce Diffusing DeBias (DDB) as a plug-in for common methods of unsupervised model debiasing.<n>Specifically, our approach adopts conditional diffusion models to generate synthetic bias-aligned images.<n>By tackling the fundamental issue of bias-conflicting training samples in learning auxiliary models, our proposed method beats current state-of-the-art in multiple benchmark datasets.
arXiv Detail & Related papers (2025-02-13T18:17:03Z)
Model Debiasing by Learnable Data Augmentation [19.625915578646758]
This paper proposes a novel 2-stage learning pipeline featuring a data augmentation strategy able to regularize the training. Experiments on synthetic and realistic biased datasets show state-of-the-art classification accuracy, outperforming competing methods.
arXiv Detail & Related papers (2024-08-09T09:19:59Z)
Improving Bias Mitigation through Bias Experts in Natural Language Understanding [10.363406065066538]
We propose a new debiasing framework that introduces binary classifiers between the auxiliary model and the main model. Our proposed strategy improves the bias identification ability of the auxiliary model.
arXiv Detail & Related papers (2023-12-06T16:15:00Z)
Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios. Existing debiasing methods suffer from high costs in bias labeling or model re-training. We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z)
Data augmentation and explainability for bias discovery and mitigation in deep learning [0.0]
This dissertation explores the impact of bias in deep neural networks and presents methods for reducing its influence on model performance. The first part begins by categorizing and describing potential sources of bias and errors in data and models, with a particular focus on bias in machine learning pipelines. The next chapter outlines a taxonomy and methods of Explainable AI as a way to justify predictions and control and improve the model.
arXiv Detail & Related papers (2023-08-18T11:02:27Z)
Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data. We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations. Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)
Pseudo Bias-Balanced Learning for Debiased Chest X-ray Classification [57.53567756716656]
We study the problem of developing debiased chest X-ray diagnosis models without knowing exactly the bias labels. We propose a novel algorithm, pseudo bias-balanced learning, which first captures and predicts per-sample bias labels. Our proposed method achieved consistent improvements over other state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-18T11:02:18Z)
Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features. This simplicity bias can explain their lack of robustness out of distribution (OOD) We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z)
Learning from others' mistakes: Avoiding dataset biases without modeling them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task. Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available. We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.