Related papers: Improving Adversarial Robustness via Feature Pattern Consistency Constraint

Improving Adversarial Robustness via Feature Pattern Consistency Constraint

URL: http://arxiv.org/abs/2406.08829v1
Date: Thu, 13 Jun 2024 05:38:30 GMT
Title: Improving Adversarial Robustness via Feature Pattern Consistency Constraint
Authors: Jiacong Hu, Jingwen Ye, Zunlei Feng, Jiazhen Yang, Shunyu Liu, Xiaotian Yu, Lingxiang Jia, Mingli Song,
Abstract summary: Convolutional Neural Networks (CNNs) are well-known for their vulnerability to adversarial attacks, posing significant security concerns. Most existing methods either focus on learning from adversarial perturbations, leading to overfitting to the adversarial examples, or aim to eliminate such perturbations during inference. We introduce a novel and effective Feature Pattern Consistency Constraint (FPCC) method to reinforce the latent feature's capacity to maintain the correct feature pattern.
Score: 42.50500608175905
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional Neural Networks (CNNs) are well-known for their vulnerability to adversarial attacks, posing significant security concerns. In response to these threats, various defense methods have emerged to bolster the model's robustness. However, most existing methods either focus on learning from adversarial perturbations, leading to overfitting to the adversarial examples, or aim to eliminate such perturbations during inference, inevitably increasing computational burdens. Conversely, clean training, which strengthens the model's robustness by relying solely on clean examples, can address the aforementioned issues. In this paper, we align with this methodological stream and enhance its generalizability to unknown adversarial examples. This enhancement is achieved by scrutinizing the behavior of latent features within the network. Recognizing that a correct prediction relies on the correctness of the latent feature's pattern, we introduce a novel and effective Feature Pattern Consistency Constraint (FPCC) method to reinforce the latent feature's capacity to maintain the correct feature pattern. Specifically, we propose Spatial-wise Feature Modification and Channel-wise Feature Selection to enhance latent features. Subsequently, we employ the Pattern Consistency Loss to constrain the similarity between the feature pattern of the latent features and the correct feature pattern. Our experiments demonstrate that the FPCC method empowers latent features to uphold correct feature patterns even in the face of adversarial examples, resulting in inherent adversarial robustness surpassing state-of-the-art models.

Related papers

From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models [77.04403907729738]
This survey charts the evolution of uncertainty from a passive diagnostic metric to an active control signal guiding real-time model behavior.<n>We demonstrate how uncertainty is leveraged as an active control signal across three frontiers.<n>This survey argues that mastering the new trend of uncertainty is essential for building the next generation of scalable, reliable, and trustworthy AI.
arXiv Detail & Related papers (2026-01-22T06:21:31Z)
Explainability-Guided Defense: Attribution-Aware Model Refinement Against Adversarial Data Attacks [6.573058520271728]
We identify a connection between interpretability and robustness that can be directly leveraged during training.<n>We introduce an attribution-guided refinement framework that transforms Local Interpretable Model-Agnostic Explanations into an active training signal.
arXiv Detail & Related papers (2026-01-02T19:36:03Z)
AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models [62.70575022567081]
We propose AdvChain, an alignment paradigm that teaches models dynamic self-correction through adversarial CoT tuning.<n>Our work establishes a new direction for building more robust and reliable reasoning models.
arXiv Detail & Related papers (2025-09-29T04:27:23Z)
Feature Statistics with Uncertainty Help Adversarial Robustness [19.01087281157066]
adversarial attacks tend to shift the distributions of feature statistics.<n>We propose an uncertainty-driven feature statistics adjustment module for robustness enhancement.<n>The proposed FSU module has universal applicability in training, attacking, predicting, and fine-tuning.
arXiv Detail & Related papers (2025-03-26T14:30:33Z)
Bridging Interpretability and Robustness Using LIME-Guided Model Refinement [0.0]
Local Interpretable Model-Agnostic Explanations (LIME) systematically enhance model robustness. Empirical evaluations on multiple benchmark datasets demonstrate that LIME-guided refinement not only improves interpretability but also significantly enhances resistance to adversarial perturbations and generalization to out-of-distribution data.
arXiv Detail & Related papers (2024-12-25T17:32:45Z)
Extreme Miscalibration and the Illusion of Adversarial Robustness [66.29268991629085]
Adversarial Training is often used to increase model robustness. We show that this observed gain in robustness is an illusion of robustness (IOR) We urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations.
arXiv Detail & Related papers (2024-02-27T13:49:12Z)
Mitigating Feature Gap for Adversarial Robustness by Feature Disentanglement [61.048842737581865]
Adversarial fine-tuning methods aim to enhance adversarial robustness through fine-tuning the naturally pre-trained model in an adversarial training manner. We propose a disentanglement-based approach to explicitly model and remove the latent features that cause the feature gap. Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
arXiv Detail & Related papers (2024-01-26T08:38:57Z)
The Risk of Federated Learning to Skew Fine-Tuning Features and Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model. We introduce three robustness indicators and conduct experiments across diverse robust datasets. Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z)
Exploring Robust Features for Improving Adversarial Robustness [11.935612873688122]
We explore the robust features which are not affected by the adversarial perturbations to improve the model's adversarial robustness. Specifically, we propose a feature disentanglement model to segregate the robust features from non-robust features and domain specific features. The trained domain discriminator is able to identify the domain specific features from the clean images and adversarial examples almost perfectly.
arXiv Detail & Related papers (2023-09-09T00:30:04Z)
FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation [9.025997629442896]
FACADE is designed for unsupervised mechanistic anomaly detection in deep neural networks. Our approach seeks to improve model robustness, enhance scalable model oversight, and demonstrates promising applications in real-world deployment settings.
arXiv Detail & Related papers (2023-07-20T04:00:37Z)
Feature Separation and Recalibration for Adversarial Robustness [18.975320671203132]
We propose a novel, easy-to- verify approach named Feature Separation and Recalibration. It recalibrates the malicious, non-robust activations for more robust feature maps through Separation and Recalibration. It improves the robustness of existing adversarial training methods by up to 8.57% with small computational overhead.
arXiv Detail & Related papers (2023-03-24T07:43:57Z)
Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training [106.34722726264522]
A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise. Pre-processing methods may suffer from the robustness degradation effect. A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model. We propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
arXiv Detail & Related papers (2021-06-10T01:45:32Z)
SafeAMC: Adversarial training for robust modulation recognition models [53.391095789289736]
In communication systems, there are many tasks, like modulation recognition, which rely on Deep Neural Networks (DNNs) models. These models have been shown to be susceptible to adversarial perturbations, namely imperceptible additive noise crafted to induce misclassification. We propose to use adversarial training, which consists of fine-tuning the model with adversarial perturbations, to increase the robustness of automatic modulation recognition models.
arXiv Detail & Related papers (2021-05-28T11:29:04Z)
Smoothed Geometry for Robust Attribution [36.616902063693104]
Feature attributions are a popular tool for explaining behavior of Deep Neural Networks (DNNs) They have been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs. This lack of robustness is especially problematic in high-stakes applications where adversarially-manipulated explanations could impair safety and trustworthiness.
arXiv Detail & Related papers (2020-06-11T17:35:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.