Related papers: Toward Understanding Unlearning Difficulty: A Mechanistic Perspective and Circuit-Guided Difficulty Metric

Toward Understanding Unlearning Difficulty: A Mechanistic Perspective and Circuit-Guided Difficulty Metric

URL: http://arxiv.org/abs/2601.09624v1
Date: Wed, 14 Jan 2026 16:55:58 GMT
Title: Toward Understanding Unlearning Difficulty: A Mechanistic Perspective and Circuit-Guided Difficulty Metric
Authors: Jiali Cheng, Ziheng Chen, Chirag Agarwal, Hadi Amiri,
Abstract summary: Circuit-guided Unlearning Difficulty (CUD) is a metric that assigns each sample a continuous difficulty score using circuit-level signals.<n>We identify key circuit-level patterns that reveal a mechanistic signature of difficulty.
Score: 36.2724900971511
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine unlearning is becoming essential for building trustworthy and compliant language models. Yet unlearning success varies considerably across individual samples: some are reliably erased, while others persist despite the same procedure. We argue that this disparity is not only a data-side phenomenon, but also reflects model-internal mechanisms that encode and protect memorized information. We study this problem from a mechanistic perspective based on model circuits--structured interaction pathways that govern how predictions are formed. We propose Circuit-guided Unlearning Difficulty (CUD), a {\em pre-unlearning} metric that assigns each sample a continuous difficulty score using circuit-level signals. Extensive experiments demonstrate that CUD reliably separates intrinsically easy and hard samples, and remains stable across unlearning methods. We identify key circuit-level patterns that reveal a mechanistic signature of difficulty: easy-to-unlearn samples are associated with shorter, shallower interactions concentrated in earlier-to-intermediate parts of the original model, whereas hard samples rely on longer and deeper pathways closer to late-stage computation. Compared to existing qualitative studies, CUD takes a first step toward a principled, fine-grained, and interpretable analysis of unlearning difficulty; and motivates the development of unlearning methods grounded in model mechanisms.

Related papers

Less is More Tokens: Efficient Math Reasoning via Difficulty-Aware Chain-of-Thought Distillation [82.2288581878096]
We present a framework for difficulty-aware reasoning that teaches models to dynamically adjust reasoning depth based on problem complexity.<n>We show that models can be endowed with such dynamic inference pathways without any architectural modifications.
arXiv Detail & Related papers (2025-09-05T16:40:13Z)
Hardness-Aware Dynamic Curriculum Learning for Robust Multimodal Emotion Recognition with Missing Modalities [15.783261732000883]
We propose a novel Hardness-Aware Dynamic Curriculum Learning framework, termed HARDY-MER.<n>Our framework operates in two key stages: first, it estimates the hardness level of each sample, and second, it strategically emphasizes hard samples during training.<n>Experiments on benchmark datasets demonstrate that HARDY-MER consistently outperforms existing methods in missing-modality scenarios.
arXiv Detail & Related papers (2025-08-09T03:10:56Z)
Your Pretrained Model Tells the Difficulty Itself: A Self-Adaptive Curriculum Learning Paradigm for Natural Language Understanding [53.63482987410292]
We present a self-adaptive curriculum learning paradigm that prioritizes fine-tuning examples based on difficulty scores predicted by pre-trained language models.<n>We evaluate our method on four natural language understanding (NLU) datasets covering both binary and multi-class classification tasks.
arXiv Detail & Related papers (2025-07-13T19:36:17Z)
Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning [69.64809103333839]
We investigate how explicitly modeling problem's difficulty prior information shapes the effectiveness of reinforcement learning based fine-tuning for multimodal reasoning.<n>Our approach demonstrates significant performances across various multi-modal mathematical reasoning benchmarks with only 2K+0.6K two-stage training data.
arXiv Detail & Related papers (2025-05-19T15:43:10Z)
Fairness and Robustness in Machine Unlearning [20.758637391023345]
We focus on fairness and robustness in machine unlearning algorithms.<n>Experiments demonstrate the vulnerability of current state-of-the-art approximated unlearning algorithms to adversarial attacks.<n>We demonstrate that unlearning in the intermediate and last layers is sufficient and cost-effective for time and memory complexity.
arXiv Detail & Related papers (2025-04-18T10:31:44Z)
A Neuro-inspired Interpretation of Unlearning in Large Language Models through Sample-level Unlearning Difficulty [12.382999548648726]
Existing studies assume a uniform unlearning difficulty across samples.<n>We propose a Memory Removal Difficulty ($mathrmMRD$) metric to quantify sample-level unlearning difficulty.<n>We also propose an $mathrmMRD$-based weighted sampling method to optimize existing unlearning algorithms.
arXiv Detail & Related papers (2025-04-09T07:48:10Z)
Understanding Machine Unlearning Through the Lens of Mode Connectivity [14.755831733659699]
We study mode connectivity in unlearning across a range of overlooked conditions.<n>Our findings show distinct patterns of fluctuation of different evaluation metrics along the curve.<n>This is the first study on mode connectivity in the context of machine unlearning.
arXiv Detail & Related papers (2025-04-08T20:02:10Z)
Instance-Level Difficulty: A Missing Perspective in Machine Unlearning [13.052520843129363]
We study the cruxes that make machine unlearning difficult through a thorough instance-level unlearning performance analysis.<n>In particular, we summarize four factors that make unlearning a data point difficult.<n>We argue that machine unlearning research should pay attention to the instance-level difficulty of unlearning.
arXiv Detail & Related papers (2024-10-03T23:41:42Z)
Unsupervised Continual Anomaly Detection with Contrastively-learned Prompt [80.43623986759691]
We introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD. The framework equips the UAD with continual learning capability through contrastively-learned prompts. We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation.
arXiv Detail & Related papers (2024-01-02T03:37:11Z)
Deep Active Learning with Noise Stability [24.54974925491753]
Uncertainty estimation for unlabeled data is crucial to active learning. We propose a novel algorithm that leverages noise stability to estimate data uncertainty. Our method is generally applicable in various tasks, including computer vision, natural language processing, and structural data analysis.
arXiv Detail & Related papers (2022-05-26T13:21:01Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.