Semantic-Preserving Adversarial Code Comprehension
- URL: http://arxiv.org/abs/2209.05130v1
- Date: Mon, 12 Sep 2022 10:32:51 GMT
- Title: Semantic-Preserving Adversarial Code Comprehension
- Authors: Yiyang Li, Hongqiu Wu, Hai Zhao
- Abstract summary: We propose Semantic-Preserving Adversarial Code Embeddings (SPACE) to find the worst-case semantic-preserving attacks.
Experiments and analysis demonstrate that SPACE can stay robust against state-of-the-art attacks while boosting the performance of PrLMs for code.
- Score: 75.76118224437974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Based on the tremendous success of pre-trained language models (PrLMs) for
source code comprehension tasks, current literature studies either ways to
further improve the performance (generalization) of PrLMs, or their robustness
against adversarial attacks. However, they have to compromise on the trade-off
between the two aspects and none of them consider improving both sides in an
effective and practical way. To fill this gap, we propose Semantic-Preserving
Adversarial Code Embeddings (SPACE) to find the worst-case semantic-preserving
attacks while forcing the model to predict the correct labels under these worst
cases. Experiments and analysis demonstrate that SPACE can stay robust against
state-of-the-art attacks while boosting the performance of PrLMs for code.
Related papers
- Estimating Commonsense Plausibility through Semantic Shifts [66.06254418551737]
We propose ComPaSS, a novel discriminative framework that quantifies commonsense plausibility by measuring semantic shifts.
Evaluations on two types of fine-grained commonsense plausibility estimation tasks show that ComPaSS consistently outperforms baselines.
arXiv Detail & Related papers (2025-02-19T06:31:06Z) - AdvAnchor: Enhancing Diffusion Model Unlearning with Adversarial Anchors [61.007590285263376]
Security concerns have driven researchers to unlearn inappropriate concepts through fine-tuning.
Recent fine-tuning methods exhibit a considerable performance trade-off between eliminating undesirable concepts and preserving other concepts.
We propose AdvAnchor, a novel approach that generates adversarial anchors to alleviate the trade-off issue.
arXiv Detail & Related papers (2024-12-28T04:44:07Z) - Defending Large Language Models against Jailbreak Attacks via Semantic
Smoothing [107.97160023681184]
Aligned large language models (LLMs) are vulnerable to jailbreaking attacks.
We propose SEMANTICSMOOTH, a smoothing-based defense that aggregates predictions of semantically transformed copies of a given input prompt.
arXiv Detail & Related papers (2024-02-25T20:36:03Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Transfer Attacks and Defenses for Large Language Models on Coding Tasks [30.065641782962974]
We study the effect of adversarial perturbations on coding tasks with large language models (LLMs)
We propose prompt-based defenses that involve modifying the prompt to include examples of adversarially perturbed code and explicit instructions for reversing adversarial perturbations.
Our experiments show that adversarial examples obtained with a smaller code model are indeed transferable, weakening the LLMs' performance.
arXiv Detail & Related papers (2023-11-22T15:11:35Z) - Fooling the Textual Fooler via Randomizing Latent Representations [13.77424820701913]
adversarial word-level perturbations are well-studied and effective attack strategies.
We propose a lightweight and attack-agnostic defense whose main goal is to perplex the process of generating an adversarial example.
We empirically demonstrate near state-of-the-art robustness of AdvFooler against representative adversarial word-level attacks.
arXiv Detail & Related papers (2023-10-02T06:57:25Z) - Effective Targeted Attacks for Adversarial Self-Supervised Learning [58.14233572578723]
unsupervised adversarial training (AT) has been highlighted as a means of achieving robustness in models without any label information.
We propose a novel positive mining for targeted adversarial attack to generate effective adversaries for adversarial SSL frameworks.
Our method demonstrates significant enhancements in robustness when applied to non-contrastive SSL frameworks, and less but consistent robustness improvements with contrastive SSL frameworks.
arXiv Detail & Related papers (2022-10-19T11:43:39Z) - Evaluating the Susceptibility of Pre-Trained Language Models via
Handcrafted Adversarial Examples [0.0]
We highlight a major security vulnerability in the public release of GPT-3 and investigate this vulnerability in other state-of-the-art PLMs.
We underscore token distance-minimized perturbations as an effective adversarial approach, bypassing both supervised and unsupervised quality measures.
arXiv Detail & Related papers (2022-09-05T20:29:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.