Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model Reliability
- URL: http://arxiv.org/abs/2506.07804v1
- Date: Mon, 09 Jun 2025 14:33:28 GMT
- Title: Enhancing Adversarial Robustness with Conformal Prediction: A Framework for Guaranteed Model Reliability
- Authors: Jie Bao, Chuangyin Dang, Rui Luo, Hanwei Zhang, Zhixin Zhou,
- Abstract summary: This study advances adversarial training by leveraging principles from Conformal Prediction.<n>We develop an adversarial attack method, termed OPSA (OPtimal Size Attack), designed to reduce the efficiency of conformal prediction.<n>We introduce OPSA-AT (Adversarial Training), a defense strategy that integrates OPSA within a novel conformal training paradigm.
- Score: 12.695100767905243
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: As deep learning models are increasingly deployed in high-risk applications, robust defenses against adversarial attacks and reliable performance guarantees become paramount. Moreover, accuracy alone does not provide sufficient assurance or reliable uncertainty estimates for these models. This study advances adversarial training by leveraging principles from Conformal Prediction. Specifically, we develop an adversarial attack method, termed OPSA (OPtimal Size Attack), designed to reduce the efficiency of conformal prediction at any significance level by maximizing model uncertainty without requiring coverage guarantees. Correspondingly, we introduce OPSA-AT (Adversarial Training), a defense strategy that integrates OPSA within a novel conformal training paradigm. Experimental evaluations demonstrate that our OPSA attack method induces greater uncertainty compared to baseline approaches for various defenses. Conversely, our OPSA-AT defensive model significantly enhances robustness not only against OPSA but also other adversarial attacks, and maintains reliable prediction. Our findings highlight the effectiveness of this integrated approach for developing trustworthy and resilient deep learning models for safety-critical domains. Our code is available at https://github.com/bjbbbb/Enhancing-Adversarial-Robustness-with-Conformal-Prediction.
Related papers
- Game-Theoretic Defenses for Robust Conformal Prediction Against Adversarial Attacks in Medical Imaging [12.644923600594176]
Adversarial attacks pose significant threats to the reliability and safety of deep learning models.<n>This paper introduces a novel framework that integrates conformal prediction with game-theoretic defensive strategies.
arXiv Detail & Related papers (2024-11-07T02:20:04Z) - Robust Image Classification: Defensive Strategies against FGSM and PGD Adversarial Attacks [0.0]
Adversarial attacks pose significant threats to the robustness of deep learning models in image classification.
This paper explores and refines defense mechanisms against these attacks to enhance the resilience of neural networks.
arXiv Detail & Related papers (2024-08-20T02:00:02Z) - Enhancing Adversarial Text Attacks on BERT Models with Projected Gradient Descent [0.0]
Adrial attacks against deep learning models represent a major threat to the security and reliability of natural language processing systems.
We propose a modification to the BERT-Attack framework, integrating Projected Gradient Descent (PGD) to enhance its effectiveness and robustness.
arXiv Detail & Related papers (2024-07-29T09:07:29Z) - The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks [90.52808174102157]
In safety-critical applications such as medical imaging and autonomous driving, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks.
A notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models.
This study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks.
arXiv Detail & Related papers (2024-05-14T18:05:19Z) - Enhancing Security in Federated Learning through Adaptive
Consensus-Based Model Update Validation [2.28438857884398]
This paper introduces an advanced approach for fortifying Federated Learning (FL) systems against label-flipping attacks.
We propose a consensus-based verification process integrated with an adaptive thresholding mechanism.
Our results indicate a significant mitigation of label-flipping attacks, bolstering the FL system's resilience.
arXiv Detail & Related papers (2024-03-05T20:54:56Z) - Perturbation-Invariant Adversarial Training for Neural Ranking Models:
Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents.
This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs.
In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z) - Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z) - FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated
Learning [66.56240101249803]
We study how hardening benign clients can affect the global model (and the malicious clients)
We propose a trigger reverse engineering based defense and show that our method can achieve improvement with guarantee robustness.
Our results on eight competing SOTA defense methods show the empirical superiority of our method on both single-shot and continuous FL backdoor attacks.
arXiv Detail & Related papers (2022-10-23T22:24:03Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Boosting Adversarial Training with Hypersphere Embedding [53.75693100495097]
Adversarial training is one of the most effective defenses against adversarial attacks for deep learning models.
In this work, we advocate incorporating the hypersphere embedding mechanism into the AT procedure.
We validate our methods under a wide range of adversarial attacks on the CIFAR-10 and ImageNet datasets.
arXiv Detail & Related papers (2020-02-20T08:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.