Inference-time Unlearning Using Conformal Prediction
- URL: http://arxiv.org/abs/2602.03787v1
- Date: Tue, 03 Feb 2026 17:46:50 GMT
- Title: Inference-time Unlearning Using Conformal Prediction
- Authors: Somnath Basu Roy Chowdhury, Rahul Kidambi, Avinava Dubey, David Wang, Gokhan Mergen, Amr Ahmed, Aranyak Mehta,
- Abstract summary: Unlearning is the process of efficiently removing specific information from a trained machine learning model without retraining from scratch.<n>This paper introduces a framework that iteratively refines the quality of the generated responses using feedback from the verifier without updating the model parameters.<n>This paper's approach significantly outperforms existing state-of-the-art methods, reducing unlearning error by up to 93% across challenging unlearning benchmarks.
- Score: 13.479885316485209
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine unlearning is the process of efficiently removing specific information from a trained machine learning model without retraining from scratch. Existing unlearning methods, which often provide provable guarantees, typically involve retraining a subset of model parameters based on a forget set. While these approaches show promise in certain scenarios, their underlying assumptions are often challenged in real-world applications -- particularly when applied to generative models. Furthermore, updating parameters using these unlearning procedures often degrades the general-purpose capabilities the model acquired during pre-training. Motivated by these shortcomings, this paper considers the paradigm of inference time unlearning -- wherein, the generative model is equipped with an (approximately correct) verifier that judges whether the model's response satisfies appropriate unlearning guarantees. This paper introduces a framework that iteratively refines the quality of the generated responses using feedback from the verifier without updating the model parameters. The proposed framework leverages conformal prediction to reduce computational overhead and provide distribution-free unlearning guarantees. This paper's approach significantly outperforms existing state-of-the-art methods, reducing unlearning error by up to 93% across challenging unlearning benchmarks.
Related papers
- Sharpness-Aware Parameter Selection for Machine Unlearning [6.397490580631141]
It often happens that some sensitive personal information, such as credit card numbers or passwords, are mistakenly incorporated in the training of machine learning models and need to be removed afterwards.<n>There have been various machine unlearning techniques proposed in the literature to address this problem.<n>Most of the proposed methods revolve around removing individual data samples from a trained model.<n>While the existing methods for these tasks do the unlearning task by updating the whole set of model parameters or only the last layer of the model, we show that there are a subset of model parameters that have the largest contribution in the unlearning target features.
arXiv Detail & Related papers (2025-04-08T19:41:07Z) - Are We Truly Forgetting? A Critical Re-examination of Machine Unlearning Evaluation Protocols [14.961054239793356]
We introduce a rigorous unlearning evaluation setup, in which forgetting classes exhibit semantic similarity to downstream task classes.<n>We hope our benchmark serves as a standardized protocol for evaluating unlearning algorithms under realistic conditions.
arXiv Detail & Related papers (2025-03-10T07:11:34Z) - Evaluating of Machine Unlearning: Robustness Verification Without Prior Modifications [15.257558809246524]
Unlearning is a process enabling pre-trained models to remove the influence of specific training samples.
Existing verification methods rely on machine learning attack techniques, such as membership inference attacks (MIAs) or backdoor attacks.
We propose a novel verification scheme without any prior modifications, and can support verification on a much larger set.
arXiv Detail & Related papers (2024-10-14T03:19:14Z) - In-Context Unlearning: Language Models as Few Shot Unlearners [27.962361828354716]
We propose a new class of unlearning methods for Large Language Models (LLMs)
This method unlearns instances from the model by simply providing specific kinds of inputs in context, without the need to update model parameters.
Our experimental results demonstrate that in-context unlearning performs on par with, or in some cases outperforms other state-of-the-art methods that require access to model parameters.
arXiv Detail & Related papers (2023-10-11T15:19:31Z) - Rigorous Assessment of Model Inference Accuracy using Language
Cardinality [5.584832154027001]
We develop a systematic approach that minimizes bias and uncertainty in model accuracy assessment by replacing statistical estimation with deterministic accuracy measures.
We experimentally demonstrate the consistency and applicability of our approach by assessing the accuracy of models inferred by state-of-the-art inference tools.
arXiv Detail & Related papers (2022-11-29T21:03:26Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Predictive machine learning for prescriptive applications: a coupled
training-validating approach [77.34726150561087]
We propose a new method for training predictive machine learning models for prescriptive applications.
This approach is based on tweaking the validation step in the standard training-validating-testing scheme.
Several experiments with synthetic data demonstrate promising results in reducing the prescription costs in both deterministic and real models.
arXiv Detail & Related papers (2021-10-22T15:03:20Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.