Related papers: Adaptive Attack Detection in Text Classification: Leveraging Space Exploration Features for Text Sentiment Classification

Adaptive Attack Detection in Text Classification: Leveraging Space Exploration Features for Text Sentiment Classification

URL: http://arxiv.org/abs/2308.15663v1
Date: Tue, 29 Aug 2023 23:02:26 GMT
Title: Adaptive Attack Detection in Text Classification: Leveraging Space Exploration Features for Text Sentiment Classification
Authors: Atefeh Mahdavi, Neda Keivandarian, Marco Carvalho,
Abstract summary: Adversarial example detection plays a vital role in adaptive cyber defense, especially in the face of rapidly evolving attacks. We propose a novel approach that leverages the power of BERT (Bidirectional Representations from Transformers) and introduces the concept of Space Exploration Features.
Score: 44.99833362998488
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adversarial example detection plays a vital role in adaptive cyber defense, especially in the face of rapidly evolving attacks. In adaptive cyber defense, the nature and characteristics of attacks continuously change, making it crucial to have robust mechanisms in place to detect and counter these threats effectively. By incorporating adversarial example detection techniques, adaptive cyber defense systems can enhance their ability to identify and mitigate attacks that attempt to exploit vulnerabilities in machine learning models or other systems. Adversarial examples are inputs that are crafted by applying intentional perturbations to natural inputs that result in incorrect classification. In this paper, we propose a novel approach that leverages the power of BERT (Bidirectional Encoder Representations from Transformers) and introduces the concept of Space Exploration Features. We utilize the feature vectors obtained from the BERT model's output to capture a new representation of feature space to improve the density estimation method.

Related papers

Behavior-Aware and Generalizable Defense Against Black-Box Adversarial Attacks for ML-Based IDS [2.179313476241343]
Black box adversarial attacks are increasingly targeted by machine learning based intrusion detection systems.<n>We propose Adaptive Feature Poisoning, a lightweight and proactive defense mechanism designed specifically for realistic black box scenarios.<n>We evaluate its ability to confuse attackers, degrade attack effectiveness, and preserve detection performance.
arXiv Detail & Related papers (2025-12-15T16:29:23Z)
Spoofing-aware Prompt Learning for Unified Physical-Digital Facial Attack Detection [28.74960061024677]
Real-world face recognition systems are vulnerable to both physical presentation attacks (PAs) and digital forgery attacks (DFs)<n>We propose a Spoofing-aware Prompt Learning for Unified Attack Detection (SPL-UAD) framework, which decouples optimization branches for physical and digital attacks in the prompt space.<n>Experiments on the large-scale UniAttackDataPlus dataset demonstrate that the proposed method achieves significant performance improvements in unified attack detection tasks.
arXiv Detail & Related papers (2025-12-06T09:34:39Z)
Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers [0.0]
Large language models (LLMs) aligned for safety often exhibit emergent deceptive behaviors.<n>This paper introduces adversarial activation patching, a novel mechanistic interpretability framework.<n>By sourcing activations from "deceptive" prompts, we simulate vulnerabilities and quantify deception rates.
arXiv Detail & Related papers (2025-07-12T21:29:49Z)
Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning [54.26807397329468]
This work explores a previously overlooked vulnerability in distributed deep learning systems.<n>An adversary who intercepts the intermediate features transmitted between them can still pose a serious threat.<n>We propose an exploitation strategy specifically designed for distributed settings.
arXiv Detail & Related papers (2025-07-09T20:09:00Z)
Alleviating Attack Data Scarcity: SCANIA's Experience Towards Enhancing In-Vehicle Cyber Security Measures [0.1631115063641726]
This paper presents a context-aware attack data generator that generates attack inputs and corresponding in-vehicle network log.<n>It utilizes parameterized attack models augmented with CAN message decoding and attack intensity adjustments to configure the attack scenarios.<n>We develop and perform an empirical evaluation of two deep neural network IDS models using the generated data.
arXiv Detail & Related papers (2025-07-03T13:31:33Z)
Unleashing the Power of Pre-trained Encoders for Universal Adversarial Attack Detection [21.03032944637112]
Adrial attacks pose a critical security threat to real-world AI systems. This paper proposes a lightweight adversarial detection framework based on the large-scale pre-trained vision-language model CLIP.
arXiv Detail & Related papers (2025-04-01T05:21:45Z)
Undermining Image and Text Classification Algorithms Using Adversarial Attacks [0.0]
Our study addresses the gap by training various machine learning models and using GANs and SMOTE to generate additional data points aimed at attacking text classification models. Our experiments reveal a significant vulnerability in classification models. Specifically, we observe a 20 % decrease in accuracy for the top-performing text classification models post-attack, along with a 30 % decrease in facial recognition accuracy.
arXiv Detail & Related papers (2024-11-03T18:44:28Z)
Fortify the Guardian, Not the Treasure: Resilient Adversarial Detectors [0.0]
An adaptive attack is one where the attacker is aware of the defenses and adapts their strategy accordingly. Our proposed method leverages adversarial training to reinforce the ability to detect attacks, without compromising clean accuracy. Experimental evaluations on the CIFAR-10 and SVHN datasets demonstrate that our proposed algorithm significantly improves a detector's ability to accurately identify adaptive adversarial attacks.
arXiv Detail & Related papers (2024-04-18T12:13:09Z)
Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack [24.954755569786396]
We propose a framework for a broader class of adversarial attacks, designed to perform minor perturbations in machine-generated content to evade detection. We consider two attack settings: white-box and black-box, and employ adversarial learning in dynamic scenarios to assess the potential enhancement of the current detection model's robustness. The empirical results reveal that the current detection models can be compromised in as little as 10 seconds, leading to the misclassification of machine-generated text as human-written content.
arXiv Detail & Related papers (2024-04-02T12:49:22Z)
FaultGuard: A Generative Approach to Resilient Fault Prediction in Smart Electrical Grids [53.2306792009435]
FaultGuard is the first framework for fault type and zone classification resilient to adversarial attacks. We propose a low-complexity fault prediction model and an online adversarial training technique to enhance robustness. Our model outclasses the state-of-the-art for resilient fault prediction benchmarking, with an accuracy of up to 0.958.
arXiv Detail & Related papers (2024-03-26T08:51:23Z)
RobustSense: Defending Adversarial Attack for Secure Device-Free Human Activity Recognition [37.387265457439476]
We propose a novel learning framework, RobustSense, to defend common adversarial attacks. Our method works well on wireless human activity recognition and person identification systems.
arXiv Detail & Related papers (2022-04-04T15:06:03Z)
Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks. We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z)
Adversarially Robust One-class Novelty Detection [83.1570537254877]
We show that existing novelty detectors are susceptible to adversarial examples. We propose a defense strategy that manipulates the latent space of novelty detectors to improve the robustness against adversarial examples.
arXiv Detail & Related papers (2021-08-25T10:41:29Z)
Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications. We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths. Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z)
Selective and Features based Adversarial Example Detection [12.443388374869745]
Security-sensitive applications that relay on Deep Neural Networks (DNNs) are vulnerable to small perturbations crafted to generate Adversarial Examples (AEs) We propose a novel unsupervised detection mechanism that uses the selective prediction, processing model layers outputs, and knowledge transfer concepts in a multi-task learning setting. Experimental results show that the proposed approach achieves comparable results to the state-of-the-art methods against tested attacks in white box scenario and better results in black and gray boxes scenarios.
arXiv Detail & Related papers (2021-03-09T11:06:15Z)
A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems. This paper proposes a self-supervised adversarial training mechanism in the input space. It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.