DINA: A Dual Defense Framework Against Internal Noise and External Attacks in Natural Language Processing
- URL: http://arxiv.org/abs/2508.05671v1
- Date: Mon, 04 Aug 2025 16:33:17 GMT
- Title: DINA: A Dual Defense Framework Against Internal Noise and External Attacks in Natural Language Processing
- Authors: Ko-Wei Chuang, Hen-Hsen Huang, Tsai-Yen Li,
- Abstract summary: Large language models (LLMs) and generative AI become increasingly integrated into customer service and moderation applications.<n>In this work, we identify and systematically address these dual adversarial threats by introducing DINA (Dual Defense Against Internal Noise and Adversarial Attacks)<n>Our approach adapts advanced noisy-label learning methods from computer vision and integrates them with adversarial training to simultaneously mitigate internal label sabotage and external adversarial perturbations.
- Score: 12.279803315688218
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: As large language models (LLMs) and generative AI become increasingly integrated into customer service and moderation applications, adversarial threats emerge from both external manipulations and internal label corruption. In this work, we identify and systematically address these dual adversarial threats by introducing DINA (Dual Defense Against Internal Noise and Adversarial Attacks), a novel unified framework tailored specifically for NLP. Our approach adapts advanced noisy-label learning methods from computer vision and integrates them with adversarial training to simultaneously mitigate internal label sabotage and external adversarial perturbations. Extensive experiments conducted on a real-world dataset from an online gaming service demonstrate that DINA significantly improves model robustness and accuracy compared to baseline models. Our findings not only highlight the critical necessity of dual-threat defenses but also offer practical strategies for safeguarding NLP systems in realistic adversarial scenarios, underscoring broader implications for fair and responsible AI deployment.
Related papers
- Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks [23.881766496924502]
We propose a framework that formalizes the agent-attacker interaction as a two-player zero-sum Markov game and co-trains both players through a three-stage pipeline.<n>Our approach significantly outperforms established training-based and prompt-based defenses.
arXiv Detail & Related papers (2026-03-04T18:29:54Z) - Contextualized Privacy Defense for LLM Agents [84.30907378390512]
LLM agents increasingly act on users' personal information, yet existing privacy defenses remain limited in both design and adaptability.<n>We propose Contextualized Defense Instructing (CDI), a new privacy defense paradigm.<n>We show that our CDI consistently achieves a better balance between privacy preservation (94.2%) and helpfulness (80.6%) than baselines.
arXiv Detail & Related papers (2026-03-03T13:35:33Z) - Incentive-Aware AI Safety via Strategic Resource Allocation: A Stackelberg Security Games Perspective [31.55000083809067]
We show how game-theoretic deterrence can make AI oversight proactive, risk-aware, and resilient to manipulation.<n>We illustrate how this framework can inform (1) training-time auditing against data/feedback poisoning, (2) pre-deployment evaluation under constrained reviewer resources, and (3) robust multi-model deployment in adversarial environments.
arXiv Detail & Related papers (2026-02-06T23:20:26Z) - Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization [51.12422886183246]
Large Language Models (LLMs) have developed rapidly in web services, delivering unprecedented capabilities while amplifying societal risks.<n>Existing works tend to focus on either isolated jailbreak attacks or static defenses, neglecting the dynamic interplay between evolving threats and safeguards in real-world web contexts.<n>We propose ACE-Safety, a novel framework that jointly optimize attack and defense models by seamlessly integrating two key innovative procedures.
arXiv Detail & Related papers (2025-11-24T15:23:41Z) - Debiased Dual-Invariant Defense for Adversarially Robust Person Re-Identification [52.63017280231648]
Person re-identification (ReID) is a fundamental task in many real-world applications such as pedestrian trajectory tracking.<n>Person ReID models are highly susceptible to adversarial attacks, where imperceptible perturbations to pedestrian images can cause entirely incorrect predictions.<n>We propose a dual-invariant defense framework composed of two main phases.
arXiv Detail & Related papers (2025-11-13T03:56:40Z) - Neutral Agent-based Adversarial Policy Learning against Deep Reinforcement Learning in Multi-party Open Systems [3.431456142488844]
We propose a neutral agent-based approach across various task scenarios in multi-party open systems.<n>We evaluate our proposed method on the SMAC platform based on Starcraft II and the autonomous driving simulation platform Highway-env.
arXiv Detail & Related papers (2025-10-13T02:53:22Z) - Rethinking Data Protection in the (Generative) Artificial Intelligence Era [115.71019708491386]
We propose a four-level taxonomy that captures the diverse protection needs arising in modern (generative) AI models and systems.<n>Our framework offers a structured understanding of the trade-offs between data utility and control, spanning the entire AI pipeline.
arXiv Detail & Related papers (2025-07-03T02:45:51Z) - Preventing Adversarial AI Attacks Against Autonomous Situational Awareness: A Maritime Case Study [0.0]
Adrial artificial intelligence (AI) attacks pose a significant threat to autonomous transportation.<n>This paper addresses three critical research challenges associated with adversarial AI.<n>We propose building defences utilising multiple inputs and data fusion to create defensive components.
arXiv Detail & Related papers (2025-05-27T17:59:05Z) - Reformulation is All You Need: Addressing Malicious Text Features in DNNs [53.45564571192014]
We propose a unified and adaptive defense framework that is effective against both adversarial and backdoor attacks.<n>Our framework outperforms existing sample-oriented defense baselines across a diverse range of malicious textual features.
arXiv Detail & Related papers (2025-02-02T03:39:43Z) - Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI [52.138044013005]
generative AI, particularly large language models (LLMs), become increasingly integrated into production applications.
New attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems.
Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversarial attacks.
This work aims to bridge the gap between academic insights and practical security measures for the protection of generative AI systems.
arXiv Detail & Related papers (2024-09-23T10:18:10Z) - Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - A Novel Approach to Guard from Adversarial Attacks using Stable Diffusion [0.0]
Our proposal suggests a different approach to the AI Guardian framework.
Instead of including adversarial examples in the training process, we propose training the AI system without them.
This aims to create a system that is inherently resilient to a wider range of attacks.
arXiv Detail & Related papers (2024-05-03T04:08:15Z) - CANEDERLI: On The Impact of Adversarial Training and Transferability on CAN Intrusion Detection Systems [17.351539765989433]
A growing integration of vehicles with external networks has led to a surge in attacks targeting their Controller Area Network (CAN) internal bus.
As a countermeasure, various Intrusion Detection Systems (IDSs) have been suggested in the literature to prevent and mitigate these threats.
Most of these systems rely on data-driven approaches such as Machine Learning (ML) and Deep Learning (DL) models.
In this paper, we present CANEDERLI, a novel framework for securing CAN-based IDSs.
arXiv Detail & Related papers (2024-04-06T14:54:11Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.