CURE: Controlled Unlearning for Robust Embeddings - Mitigating Conceptual Shortcuts in Pre-Trained Language Models
- URL: http://arxiv.org/abs/2509.05230v2
- Date: Wed, 10 Sep 2025 17:32:53 GMT
- Title: CURE: Controlled Unlearning for Robust Embeddings - Mitigating Conceptual Shortcuts in Pre-Trained Language Models
- Authors: Aysenur Kocak, Shuo Yang, Bardh Prenkaj, Gjergji Kasneci,
- Abstract summary: We introduce CURE, a framework that systematically disentangles and suppresses conceptual shortcuts.<n>CURE achieves an absolute improvement of +10 points in F1 score on IMDB and +2 points on Yelp.
- Score: 23.898244353656352
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained language models have achieved remarkable success across diverse applications but remain susceptible to spurious, concept-driven correlations that impair robustness and fairness. In this work, we introduce CURE, a novel and lightweight framework that systematically disentangles and suppresses conceptual shortcuts while preserving essential content information. Our method first extracts concept-irrelevant representations via a dedicated content extractor reinforced by a reversal network, ensuring minimal loss of task-relevant information. A subsequent controllable debiasing module employs contrastive learning to finely adjust the influence of residual conceptual cues, enabling the model to either diminish harmful biases or harness beneficial correlations as appropriate for the target task. Evaluated on the IMDB and Yelp datasets using three pre-trained architectures, CURE achieves an absolute improvement of +10 points in F1 score on IMDB and +2 points on Yelp, while introducing minimal computational overhead. Our approach establishes a flexible, unsupervised blueprint for combating conceptual biases, paving the way for more reliable and fair language understanding systems.
Related papers
- MeGU: Machine-Guided Unlearning with Target Feature Disentanglement [73.49657372882082]
We propose a novel framework that guides unlearning through concept-aware re-alignment.<n>MeGU enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.
arXiv Detail & Related papers (2026-02-19T05:20:31Z) - Concepts' Information Bottleneck Models [9.435622803973898]
Concept Bottleneck Models (CBMs) aim to deliver interpretable predictions by routing decisions through a human-understandable concept layer.<n>We introduce an explicit Information Bottleneck regularizer on the concept layer that penalizes $I(X;C)$ while preserving task-relevant information in $I(C;Y)$, encouraging minimal-sufficient concept representations.
arXiv Detail & Related papers (2026-02-16T10:33:20Z) - Forget Less by Learning Together through Concept Consolidation [6.121904567143191]
Custom Diffusion Models (CDMs) have gained significant attention due to their remarkable ability to personalize generative processes.<n>Existing CDMs suffer from catastrophic forgetting when continuously learning new concepts.<n>We propose Forget Less by Learning Together (FL2T) that enables concurrent and order-agnostic concept learning.
arXiv Detail & Related papers (2026-01-05T10:14:16Z) - AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models [63.05306474002547]
Regulatory frameworks mandating the 'right to be forgotten' drive the need for machine unlearning.<n>We introduce AUVIC, a novel visual concept unlearning framework for MLLMs.<n>We show that AUVIC achieves state-of-the-art target forgetting rates while incurs minimal performance degradation on non-target concepts.
arXiv Detail & Related papers (2025-11-14T13:35:32Z) - R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning [7.08366053718851]
Redundancy-Removal Mixture of Experts (R2MoE) is a parameter-efficient framework for lifelong visual concept learning.<n>Our method generates images with superior conceptual fidelity compared to the state-of-the-art (SOTA) method.
arXiv Detail & Related papers (2025-07-17T13:22:40Z) - Interpretable Reward Modeling with Active Concept Bottlenecks [54.00085739303773]
We introduce Concept Bottleneck Reward Models (CB-RM), a reward modeling framework that enables interpretable preference learning.<n>Unlike standard RLHF methods that rely on opaque reward functions, CB-RM decomposes reward prediction into human-interpretable concepts.<n>We formalize an active learning strategy that dynamically acquires the most informative concept labels.
arXiv Detail & Related papers (2025-07-07T06:26:04Z) - Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts [79.18608192761512]
Self-Explainable Models (SEMs) rely on Prototypical Concept Learning (PCL) to enable their visual recognition processes more interpretable.<n>We propose a Few-Shot Prototypical Concept Classification framework that mitigates two key challenges under low-data regimes: parametric imbalance and representation misalignment.<n>Our approach consistently outperforms existing SEMs by a notable margin, with 4.2%-8.7% relative gains in 5-way 5-shot classification.
arXiv Detail & Related papers (2025-06-05T06:39:43Z) - CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models [6.738409533239947]
CURE is a training-free concept unlearning framework that operates directly in the weight space of pre-trained diffusion models.<n>The Spectral Eraser identifies and isolates features unique to the undesired concept while preserving safe attributes.<n>CURE achieves a more efficient and thorough removal for targeted artistic styles, objects, identities, or explicit content.
arXiv Detail & Related papers (2025-05-19T03:53:06Z) - Language Guided Concept Bottleneck Models for Interpretable Continual Learning [62.09201360376577]
Continual learning aims to enable learning systems to acquire new knowledge constantly without forgetting previously learned information.<n>Most existing CL methods focus primarily on preserving learned knowledge to improve model performance.<n>We introduce a novel framework that integrates language-guided Concept Bottleneck Models to address both challenges.
arXiv Detail & Related papers (2025-03-30T02:41:55Z) - Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models [56.35484513848296]
FADE (Fine grained Attenuation for Diffusion Erasure) is an adjacency-aware unlearning algorithm for text-to-image generative models.<n>It removes target concepts with minimal impact on correlated concepts, achieving a 12% improvement in retention performance over state-of-the-art methods.
arXiv Detail & Related papers (2025-03-25T15:49:48Z) - UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning [57.081646768835704]
User specifications or legal frameworks often require information to be removed from pretrained models, including large language models (LLMs)<n>This requires deleting or "forgetting" a set of data points from an already-trained model, which typically degrades its performance on other data points.<n>We propose UPCORE, a method-agnostic data selection framework for mitigating collateral damage during unlearning.
arXiv Detail & Related papers (2025-02-20T22:51:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.