Related papers: Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning

Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning

URL: http://arxiv.org/abs/2512.07374v1
Date: Mon, 08 Dec 2025 10:10:12 GMT
Title: Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning
Authors: Yezi Liu, Hanning Chen, Wenjun Huang, Yang Ni, Mohsen Imani,
Abstract summary: We introduce Recover-to-Forget (R2F), a novel framework for efficient unlearning in large foundation models.<n>R2F reconstructs full-model gradient directions from low-rank LoRA adapter updates.<n>We show that R2F offers a scalable and lightweight alternative for unlearning in pretrained LLMs without requiring full retraining or access to internal parameters.
Score: 17.898277374771254
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unlearning in large foundation models (e.g., LLMs) is essential for enabling dynamic knowledge updates, enforcing data deletion rights, and correcting model behavior. However, existing unlearning methods often require full-model fine-tuning or access to the original training data, which limits their scalability and practicality. In this work, we introduce Recover-to-Forget (R2F), a novel framework for efficient unlearning in LLMs based on reconstructing full-model gradient directions from low-rank LoRA adapter updates. Rather than performing backpropagation through the full model, we compute gradients with respect to LoRA parameters using multiple paraphrased prompts and train a gradient decoder to approximate the corresponding full-model gradients. To ensure applicability to larger or black-box models, the decoder is trained on a proxy model and transferred to target models. We provide a theoretical analysis of cross-model generalization and demonstrate that our method achieves effective unlearning while preserving general model performance. Experimental results demonstrate that R2F offers a scalable and lightweight alternative for unlearning in pretrained LLMs without requiring full retraining or access to internal parameters.

Related papers

Sparsity-Aware Unlearning for Large Language Models [20.699929903336113]
Large Language Models (LLMs) inevitably memorize sensitive information during training, posing significant privacy risks.<n>Machine unlearning has emerged as a promising solution to selectively remove such information without full retraining.<n>We find that unlearning effectiveness degrades substantially on sparse models.<n>We propose Sparsity-Aware Unlearning (SAU), which decouples unlearning from sparsification objectives through gradient masking.
arXiv Detail & Related papers (2026-01-31T07:45:30Z)
Reversing Large Language Models for Efficient Training and Fine-Tuning [24.232966507637673]
Large Language Models (LLMs) are known for their expensive and time-consuming training.<n>We introduce memory-efficient, reversible architectures for LLMs inspired by symmetric and symplectic differential equations.<n>Our results show comparable or improved performance on several datasets and benchmarks.
arXiv Detail & Related papers (2025-11-27T19:32:15Z)
Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch [63.40752011615843]
Training tool-augmented language models has emerged as a promising approach to enhancing their capabilities for complex tasks.<n>We propose a dynamic generalization-guided reward design for rule-based reinforcement learning.<n>We show that our models achieve over 7% performance improvement compared to both SFT and RL-with-SFT models.
arXiv Detail & Related papers (2025-11-02T16:33:45Z)
Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach [65.6966065843227]
Iterative Reweight-then-IRO is a framework that performs RL-style alignment of a frozen base model without touching its parameters.<n>At test time, the value functions are used to guide the base model generation via a search-based optimization process.<n> Notably, users can apply IRO to align a model on their own dataset, similar to OpenAI's reinforcement fine-tuning (RFT)
arXiv Detail & Related papers (2025-06-21T21:49:02Z)
Train Once, Forget Precisely: Anchored Optimization for Efficient Post-Hoc Unlearning [0.0]
We introduce textbfForget-Aligned Model Reconstruction (FAMR), a theoretically grounded and computationally efficient framework for post-hoc unlearning in deep image classifiers.<n>FAMR frames forgetting as a constrained optimization problem that minimizes a uniformprediction loss on the forget set while anchoring model parameters to their original values.<n> Empirical results on class forgetting tasks using CIFAR-10 and ImageNet-100 FAMR's effectiveness, with strong performance retention and minimal computational overhead.
arXiv Detail & Related papers (2025-06-17T13:40:48Z)
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? [55.33467849079774]
Low-rank adaptation (LoRA) is a popular and efficient training technique for updating or domain-specific adaptation of Large Language Models.<n>We investigate how new facts can be incorporated into the LLM using LoRA without compromising the previously learned knowledge.
arXiv Detail & Related papers (2025-02-20T12:31:03Z)
Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs [76.40876036912537]
Large Language Models (LLMs) demonstrate strong few-shot adaptability without requiring fine-tuning.<n>Current Visual Foundation Models (VFMs) require explicit fine-tuning with sufficient tuning data.<n>We propose a framework, LoRA Recycle, that distills a meta-LoRA from diverse pre-tuned LoRAs with a meta-learning objective.
arXiv Detail & Related papers (2024-12-03T07:25:30Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress.<n>LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset.<n>Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z)
Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent [15.463595798992621]
Large language models (LLMs) have revolutionized the deep learning paradigm, yielding impressive results across a wide array of tasks. Existing solutions make the unrealistic assumption that the entire model is exchanged for training. We introduce a novel method for the efficient training and fine-tuning of LLMs in FL, with minimal resource consumption.
arXiv Detail & Related papers (2024-06-17T03:49:44Z)
GPTA: Generative Prompt Tuning Assistant for Synergistic Downstream Neural Network Enhancement with LLMs [11.572835837392867]
This study introduces GPTA, a Large Language Model assistance training framework, that enhances the training of downstream task models via prefix prompt. By minimizing data exposure to LLM, the framework addresses the security and legal challenges of applying LLM in downstream task model training.
arXiv Detail & Related papers (2024-03-29T23:04:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.