LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty
- URL: http://arxiv.org/abs/2503.18314v2
- Date: Tue, 25 Mar 2025 06:23:57 GMT
- Title: LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty
- Authors: Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves, Petros Daras,
- Abstract summary: We present LoTUS, a novel Machine Unlearning (MU) method that eliminates the influence of training samples from pre-trained models.<n>LoTUS smooths the prediction probabilities of the model up to an information-theoretic bound, mitigating its over-confidence stemming from data memorization.<n>We evaluate LoTUS on Transformer and ResNet18 models against eight baselines across five public datasets.
- Score: 31.008361777309638
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present LoTUS, a novel Machine Unlearning (MU) method that eliminates the influence of training samples from pre-trained models, avoiding retraining from scratch. LoTUS smooths the prediction probabilities of the model up to an information-theoretic bound, mitigating its over-confidence stemming from data memorization. We evaluate LoTUS on Transformer and ResNet18 models against eight baselines across five public datasets. Beyond established MU benchmarks, we evaluate unlearning on ImageNet1k, a large-scale dataset, where retraining is impractical, simulating real-world conditions. Moreover, we introduce the novel Retrain-Free Jensen-Shannon Divergence (RF-JSD) metric to enable evaluation under real-world conditions. The experimental results show that LoTUS outperforms state-of-the-art methods in terms of both efficiency and effectiveness. Code: https://github.com/cspartalis/LoTUS.
Related papers
- Can Test-Time Scaling Improve World Foundation Model? [67.82670175383761]
We introduce SWIFT, a test-time scaling framework tailored for world foundation models (WFMs)
Empirical results on the COSMOS model demonstrate that test-time scaling exists even in a compute-optimal way.
Our findings reveal that test-time scaling laws hold for WFMs and that SWIFT provides a scalable and effective pathway for improving WFM inference without retraining or increasing model size.
arXiv Detail & Related papers (2025-03-31T17:07:37Z) - Transfer learning in Scalable Graph Neural Network for Improved Physical Simulation [37.1565271299621]
We introduce a pre-training and transfer learning paradigm for graph network simulators.
We show that our proposed transfer learning methods allow the model to perform even better when fine-tuned with small amounts of training data.
arXiv Detail & Related papers (2025-02-07T08:18:23Z) - MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL [20.22674077197914]
Recent work has explored updating neural networks with large numbers of gradient steps for every new sample.
High update-to-data ratios introduce instability to the training process.
Our method, Model-Augmented Data for Temporal Difference learning (MAD-TD), uses small amounts of generated data to stabilize high UTD training.
arXiv Detail & Related papers (2024-10-11T15:13:17Z) - Understanding Fine-tuning in Approximate Unlearning: A Theoretical Perspective [39.958103832214135]
Fine-tuning (FT) methods have become one of the fundamental approaches for approximating unlearning.<n>We present the first theoretical analysis of FT methods for machine unlearning within a linear regression framework.<n>We propose a novel Retention-Based Masking (RBM) strategy that constructs a weight saliency map based on the remaining dataset.
arXiv Detail & Related papers (2024-10-04T18:01:52Z) - Mitigating Noise Detriment in Differentially Private Federated Learning with Model Pre-training [27.1846697092374]
Pre-training exploits public datasets to pre-train an advanced machine learning model.
We are the first to explore how model pre-training can mitigate noise detriment in differentially private federated learning.
arXiv Detail & Related papers (2024-08-18T13:48:10Z) - Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems [17.10762463903638]
We train evaluation models to approximate human evaluation, achieving high agreement.
We propose a weak-to-strong supervision method that uses a fraction of the annotated data to train an evaluation model.
arXiv Detail & Related papers (2024-06-26T10:48:14Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Improving Maximum Likelihood Training for Text Generation with Density
Ratio Estimation [51.091890311312085]
We propose a new training scheme for auto-regressive sequence generative models, which is effective and stable when operating at large sample space encountered in text generation.
Our method stably outperforms Maximum Likelihood Estimation and other state-of-the-art sequence generative models in terms of both quality and diversity.
arXiv Detail & Related papers (2020-07-12T15:31:24Z) - Model Embedding Model-Based Reinforcement Learning [4.566180616886624]
Model-based reinforcement learning (MBRL) has shown its advantages in sample-efficiency over model-free reinforcement learning (MFRL)
Despite the impressive results it achieves, it still faces a trade-off between the ease of data generation and model bias.
We propose a simple and elegant model-embedding model-based reinforcement learning (MEMB) algorithm in the framework of the probabilistic reinforcement learning.
arXiv Detail & Related papers (2020-06-16T15:10:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.