Unlearning Imperative: Securing Trustworthy and Responsible LLMs through Engineered Forgetting
- URL: http://arxiv.org/abs/2511.09855v1
- Date: Fri, 14 Nov 2025 01:13:12 GMT
- Title: Unlearning Imperative: Securing Trustworthy and Responsible LLMs through Engineered Forgetting
- Authors: James Jin Kang, Dang Bui, Thanh Pham, Huo-Chong Ling,
- Abstract summary: Large language models in sensitive domains can't ensure that private information can be permanently forgotten.<n>Retraining from the beginning is prohibitively costly.<n>Existing unlearning methods remain fragmented, difficult to verify, and often vulnerable to recovery.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing use of large language models in sensitive domains has exposed a critical weakness: the inability to ensure that private information can be permanently forgotten. Yet these systems still lack reliable mechanisms to guarantee that sensitive information can be permanently removed once it has been used. Retraining from the beginning is prohibitively costly, and existing unlearning methods remain fragmented, difficult to verify, and often vulnerable to recovery. This paper surveys recent research on machine unlearning for LLMs and considers how far current approaches can address these challenges. We review methods for evaluating whether forgetting has occurred, the resilience of unlearned models against adversarial attacks, and mechanisms that can support user trust when model complexity or proprietary limits restrict transparency. Technical solutions such as differential privacy, homomorphic encryption, federated learning, and ephemeral memory are examined alongside institutional safeguards including auditing practices and regulatory frameworks. The review finds steady progress, but robust and verifiable unlearning is still unresolved. Efficient techniques that avoid costly retraining, stronger defenses against adversarial recovery, and governance structures that reinforce accountability are needed if LLMs are to be deployed safely in sensitive applications. By integrating technical and organizational perspectives, this study outlines a pathway toward AI systems that can be required to forget, while maintaining both privacy and public trust.
Related papers
- Inference-Time Safety For Code LLMs Via Retrieval-Augmented Revision [3.983997834693767]
Large Language Models (LLMs) are increasingly deployed for code generation in high-stakes software development.<n>LLMs cannot readily adapt to newly discovered vulnerabilities or changing security standards without retraining.<n>We present a principled approach to trustworthy code generation by design that operates as an inference-time safety mechanism.
arXiv Detail & Related papers (2026-03-02T06:06:34Z) - Towards Verifiably Safe Tool Use for LLM Agents [53.55621104327779]
Large language model (LLM)-based AI agents extend capabilities by enabling access to tools such as data sources, APIs, search engines, code sandboxes, and even other agents.<n>LLMs may invoke unintended tool interactions and introduce risks, such as leaking sensitive data or overwriting critical records.<n>Current approaches to mitigate these risks, such as model-based safeguards, enhance agents' reliability but cannot guarantee system safety.
arXiv Detail & Related papers (2026-01-12T21:31:38Z) - Making LLMs Reliable When It Matters Most: A Five-Layer Architecture for High-Stakes Decisions [51.56484100374058]
Current large language models (LLMs) excel in verifiable domains where outputs can be checked before action but prove less reliable for high-stakes strategic decisions with uncertain outcomes.<n>This gap, driven by mutually cognitive biases in both humans and artificial intelligence (AI) systems, threatens the defensibility of valuations and sustainability of investments in the sector.<n>This report describes a framework emerging from systematic qualitative assessment across 7 frontier-grade LLMs and 3 market-facing venture vignettes under time pressure.
arXiv Detail & Related papers (2025-11-10T22:24:21Z) - Pre-Forgettable Models: Prompt Learning as a Native Mechanism for Unlearning [9.512928441517811]
Foundation models have transformed multimedia analysis by enabling robust and transferable representations across diverse modalities and tasks.<n>Traditional unlearning approaches, including retraining, activation editing, or distillation, are often expensive, fragile, and ill-suited for real-time or continuously evolving systems.<n>We introduce a prompt-based learning framework that unifies knowledge acquisition and removal within a single training phase.
arXiv Detail & Related papers (2025-09-05T13:28:04Z) - Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security [63.41350337821108]
We propose Secure Tug-of-War (SecTOW) to enhance the security of multimodal large language models (MLLMs)<n>SecTOW consists of two modules: a defender and an auxiliary attacker, both trained iteratively using reinforcement learning (GRPO)<n>We show that SecTOW significantly improves security while preserving general performance.
arXiv Detail & Related papers (2025-07-29T17:39:48Z) - Does Machine Unlearning Truly Remove Knowledge? [80.83986295685128]
We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
arXiv Detail & Related papers (2025-05-29T09:19:07Z) - DP-RTFL: Differentially Private Resilient Temporal Federated Learning for Trustworthy AI in Regulated Industries [0.0]
This paper introduces Differentially Private Resilient Temporal Federated Learning (DP-RTFL)<n>It is designed to ensure training continuity, precise state recovery, and strong data privacy.<n>The framework is particularly suited for critical applications like credit risk assessment using sensitive financial data.
arXiv Detail & Related papers (2025-05-27T16:30:25Z) - Information Retrieval Induced Safety Degradation in AI Agents [52.15553901577888]
This study investigates how expanding retrieval access affects model reliability, bias propagation, and harmful content generation.<n>Retrieval-enabled agents built on aligned LLMs often behave more unsafely than uncensored models without retrieval.<n>These findings underscore the need for robust mitigation strategies to ensure fairness and reliability in retrieval-enabled and increasingly autonomous AI systems.
arXiv Detail & Related papers (2025-05-20T11:21:40Z) - Deep Learning Model Security: Threats and Defenses [25.074630770554105]
Deep learning has transformed AI applications but faces critical security challenges.<n>This survey examines these vulnerabilities, detailing their mechanisms and impact on model integrity and confidentiality.<n>The survey concludes with future directions, emphasizing automated defenses, zero-trust architectures, and the security challenges of large AI models.
arXiv Detail & Related papers (2024-12-12T06:04:20Z) - FEDLAD: Federated Evaluation of Deep Leakage Attacks and Defenses [50.921333548391345]
Federated Learning is a privacy preserving decentralized machine learning paradigm.<n>Recent research has revealed that private ground truth data can be recovered through a gradient technique known as Deep Leakage.<n>This paper introduces the FEDLAD Framework (Federated Evaluation of Deep Leakage Attacks and Defenses), a comprehensive benchmark for evaluating Deep Leakage attacks and defenses.
arXiv Detail & Related papers (2024-11-05T11:42:26Z) - Threats, Attacks, and Defenses in Machine Unlearning: A Survey [14.03428437751312]
Machine Unlearning (MU) has recently gained considerable attention due to its potential to achieve Safe AI.<n>This survey aims to fill the gap between the extensive number of studies on threats, attacks, and defenses in machine unlearning.
arXiv Detail & Related papers (2024-03-20T15:40:18Z) - RoFL: Attestable Robustness for Secure Federated Learning [59.63865074749391]
Federated Learning allows a large number of clients to train a joint model without the need to share their private data.
To ensure the confidentiality of the client updates, Federated Learning systems employ secure aggregation.
We present RoFL, a secure Federated Learning system that improves robustness against malicious clients.
arXiv Detail & Related papers (2021-07-07T15:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.