BalDRO: A Distributionally Robust Optimization based Framework for Large Language Model Unlearning
- URL: http://arxiv.org/abs/2601.09172v1
- Date: Wed, 14 Jan 2026 05:15:10 GMT
- Title: BalDRO: A Distributionally Robust Optimization based Framework for Large Language Model Unlearning
- Authors: Pengyang Shao, Naixin Zhai, Lei Chen, Yonghui Yang, Fengbin Zhu, Xun Yang, Meng Wang,
- Abstract summary: BalDRO is a novel and efficient framework for balanced LLM unlearning.<n>We instantiate BalDRO via two efficient variants: BalDRO-G and BalDRO-DV.<n> Experiments on TOFU and MUSE show that BalDRO significantly improves both forgetting quality and model utility.
- Score: 24.085628334112652
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As Large Language Models (LLMs) increasingly shape online content, removing targeted information from well-trained LLMs (also known as LLM unlearning) has become critical for web governance. A key challenge lies in sample-wise imbalance within the forget set: different samples exhibit widely varying unlearning difficulty, leading to asynchronous forgetting where some knowledge remains insufficiently erased while others become over-forgotten. To address this, we propose BalDRO, a novel and efficient framework for balanced LLM unlearning. BalDRO formulates unlearning as a min-sup process: an inner step identifies a worst-case data distribution that emphasizes hard-to-unlearn samples, while an outer step updates model parameters under this distribution. We instantiate BalDRO via two efficient variants: BalDRO-G, a discrete GroupDRO-based approximation focusing on high-loss subsets, and BalDRO-DV, a continuous Donsker-Varadhan dual method enabling smooth adaptive weighting within standard training pipelines. Experiments on TOFU and MUSE show that BalDRO significantly improves both forgetting quality and model utility over existing methods, and we release code for reproducibility.
Related papers
- Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning [45.86058898829962]
Multi-Ad Distributionally Robust Optimization (GDRO) is an optimization-first framework that moves beyond uniform reasoning.<n>We propose two independent GDRO games for post-training: Prompt-GDRO, which employs an EMA-debiased multiplicative-weight bandit sampler to target the intensive difficulty margin and upweight persistently hard groups without frequency bias; and Rollout-GDRO, which uses a shadow-price controller to reallocate rollouts across groups, maximizing gradient variance reduction on hard tasks under a fixed mean budget (compute-neutral)<n>We validate our framework on the DAPO 14.1k dataset using Q
arXiv Detail & Related papers (2026-01-27T07:10:41Z) - GDRO: Group-level Reward Post-training Suitable for Diffusion Models [55.948229011478304]
Group-level rewards successfully align the model with the targeted reward.<n>Group-level Direct Reward Optimization (GDRO) is a new post-training paradigm for group-level reward alignment.<n>GDRO supports full offline training that saves the large time cost for image rollout sampling.<n>It is diffusion-sampler-independent, which eliminates the need for the ODE-to-SDE approximation to obtainity.
arXiv Detail & Related papers (2026-01-05T11:47:18Z) - Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback [28.40216934244641]
Diffusion Denoising Ranking Optimization (Diffusion-DRO) is a new preference learning framework grounded in inverse reinforcement learning.<n>Diffusion-DRO removes the dependency on a reward model by casting preference learning as a ranking problem.<n>It integrates offline expert demonstrations with online policy-generated negative samples, enabling it to effectively capture human preferences.
arXiv Detail & Related papers (2025-10-21T07:22:34Z) - Adversarial Diffusion for Robust Reinforcement Learning [46.44328012099217]
We introduce Adversarial Diffusion for Robust Reinforcement Learning (AD-RRL)<n>AD-RRL guides the diffusion process to generate worst-case trajectories during training, effectively optimizing the Conditional Value at Risk (CVaR) of the cumulative return.<n> Empirical results across standard benchmarks show that AD-RRL achieves superior robustness and performance compared to existing robust RL methods.
arXiv Detail & Related papers (2025-09-28T12:34:35Z) - Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z) - R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [65.04475956174959]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML)<n>A significant challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming.<n>This paper develops a physical layer framework for resilient SFL with large language models (LLMs) and vision language models (VLMs) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.