Semantic-Preserving Adversarial Code Comprehension
- URL: http://arxiv.org/abs/2209.05130v1
- Date: Mon, 12 Sep 2022 10:32:51 GMT
- Title: Semantic-Preserving Adversarial Code Comprehension
- Authors: Yiyang Li, Hongqiu Wu, Hai Zhao
- Abstract summary: We propose Semantic-Preserving Adversarial Code Embeddings (SPACE) to find the worst-case semantic-preserving attacks.
Experiments and analysis demonstrate that SPACE can stay robust against state-of-the-art attacks while boosting the performance of PrLMs for code.
- Score: 75.76118224437974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Based on the tremendous success of pre-trained language models (PrLMs) for
source code comprehension tasks, current literature studies either ways to
further improve the performance (generalization) of PrLMs, or their robustness
against adversarial attacks. However, they have to compromise on the trade-off
between the two aspects and none of them consider improving both sides in an
effective and practical way. To fill this gap, we propose Semantic-Preserving
Adversarial Code Embeddings (SPACE) to find the worst-case semantic-preserving
attacks while forcing the model to predict the correct labels under these worst
cases. Experiments and analysis demonstrate that SPACE can stay robust against
state-of-the-art attacks while boosting the performance of PrLMs for code.
Related papers
- The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks [90.52808174102157]
In safety-critical applications such as medical imaging and autonomous driving, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks.
A notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models.
This study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks.
arXiv Detail & Related papers (2024-05-14T18:05:19Z) - Defending Large Language Models against Jailbreak Attacks via Semantic
Smoothing [107.97160023681184]
Aligned large language models (LLMs) are vulnerable to jailbreaking attacks.
We propose SEMANTICSMOOTH, a smoothing-based defense that aggregates predictions of semantically transformed copies of a given input prompt.
arXiv Detail & Related papers (2024-02-25T20:36:03Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Transfer Attacks and Defenses for Large Language Models on Coding Tasks [30.065641782962974]
We study the effect of adversarial perturbations on coding tasks with large language models (LLMs)
We propose prompt-based defenses that involve modifying the prompt to include examples of adversarially perturbed code and explicit instructions for reversing adversarial perturbations.
Our experiments show that adversarial examples obtained with a smaller code model are indeed transferable, weakening the LLMs' performance.
arXiv Detail & Related papers (2023-11-22T15:11:35Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - Fooling the Textual Fooler via Randomizing Latent Representations [13.77424820701913]
adversarial word-level perturbations are well-studied and effective attack strategies.
We propose a lightweight and attack-agnostic defense whose main goal is to perplex the process of generating an adversarial example.
We empirically demonstrate near state-of-the-art robustness of AdvFooler against representative adversarial word-level attacks.
arXiv Detail & Related papers (2023-10-02T06:57:25Z) - COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in
Language Models [4.776465250559034]
We propose a prompt-based adversarial attack on manual templates in black box scenarios.
First of all, we design character-level and word-level approaches to break manual templates separately.
And we present a greedy algorithm for the attack based on the above destructive approaches.
arXiv Detail & Related papers (2023-06-09T03:53:42Z) - Effective Targeted Attacks for Adversarial Self-Supervised Learning [58.14233572578723]
unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information.
We propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks.
Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks.
arXiv Detail & Related papers (2022-10-19T11:43:39Z) - Evaluating the Susceptibility of Pre-Trained Language Models via
Handcrafted Adversarial Examples [0.0]
We highlight a major security vulnerability in the public release of GPT-3 and investigate this vulnerability in other state-of-the-art PLMs.
We underscore token distance-minimized perturbations as an effective adversarial approach, bypassing both supervised and unsupervised quality measures.
arXiv Detail & Related papers (2022-09-05T20:29:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.