Probing Model Signal-Awareness via Prediction-Preserving Input
Minimization
- URL: http://arxiv.org/abs/2011.14934v2
- Date: Tue, 22 Jun 2021 21:44:44 GMT
- Title: Probing Model Signal-Awareness via Prediction-Preserving Input
Minimization
- Authors: Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim Laredo, Alessandro
Morari
- Abstract summary: We evaluate models' ability to capture the correct vulnerability signals to produce their predictions.
We measure the signal awareness of models using a new metric we propose- Signal-aware Recall (SAR)
The results show a sharp drop in the model's Recall from the high 90s to sub-60s with the new metric.
- Score: 67.62847721118142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work explores the signal awareness of AI models for source code
understanding. Using a software vulnerability detection use case, we evaluate
the models' ability to capture the correct vulnerability signals to produce
their predictions. Our prediction-preserving input minimization (P2IM) approach
systematically reduces the original source code to a minimal snippet which a
model needs to maintain its prediction. The model's reliance on incorrect
signals is then uncovered when the vulnerability in the original code is
missing in the minimal snippet, both of which the model however predicts as
being vulnerable. We measure the signal awareness of models using a new metric
we propose- Signal-aware Recall (SAR). We apply P2IM on three different neural
network architectures across multiple datasets. The results show a sharp drop
in the model's Recall from the high 90s to sub-60s with the new metric,
highlighting that the models are presumably picking up a lot of noise or
dataset nuances while learning their vulnerability detection logic. Although
the drop in model performance may be perceived as an adversarial attack, but
this isn't P2IM's objective. The idea is rather to uncover the signal-awareness
of a black-box model in a data-driven manner via controlled queries. SAR's
purpose is to measure the impact of task-agnostic model training, and not to
suggest a shortcoming in the Recall metric. The expectation, in fact, is for
SAR to match Recall in the ideal scenario where the model truly captures
task-specific signals.
Related papers
- Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Adversarial Robustness Assessment of NeuroEvolution Approaches [1.237556184089774]
We evaluate the robustness of models found by two NeuroEvolution approaches on the CIFAR-10 image classification task.
Our results show that when the evolved models are attacked with iterative methods, their accuracy usually drops to, or close to, zero.
Some of these techniques can exacerbate the perturbations added to the original inputs, potentially harming robustness.
arXiv Detail & Related papers (2022-07-12T10:40:19Z) - One-Pixel Shortcut: on the Learning Preference of Deep Neural Networks [28.502489028888608]
Unlearnable examples (ULEs) aim to protect data from unauthorized usage for training DNNs.
In adversarial training, the unlearnability of error-minimizing noise will severely degrade.
We propose a novel model-free method, named emphOne-Pixel Shortcut, which only perturbs a single pixel of each image and makes the dataset unlearnable.
arXiv Detail & Related papers (2022-05-24T15:17:52Z) - Label-only Model Inversion Attack: The Attack that Requires the Least
Information [14.061083728194378]
In a model inversion attack, an adversary attempts to reconstruct the data records, used to train a target model, using only the model's output.
We have found a model inversion method that can reconstruct the input data records based only on the output labels.
arXiv Detail & Related papers (2022-03-13T03:03:49Z) - Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return.
Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions.
The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z) - MEGEX: Data-Free Model Extraction Attack against Gradient-Based
Explainable AI [1.693045612956149]
Deep neural networks deployed in Machine Learning as a Service (ML) face the threat of model extraction attacks.
A model extraction attack is an attack to violate intellectual property and privacy in which an adversary steals trained models in a cloud using only their predictions.
In this paper, we propose MEGEX, a data-free model extraction attack against a gradient-based explainable AI.
arXiv Detail & Related papers (2021-07-19T14:25:06Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - Model Extraction Attacks against Recurrent Neural Networks [1.2891210250935146]
We study the threats of model extraction attacks against recurrent neural networks (RNNs)
We discuss whether a model with a higher accuracy can be extracted with a simple RNN from a long short-term memory (LSTM)
We then show that a model with a higher accuracy can be extracted efficiently, especially through configuring a loss function and a more complex architecture.
arXiv Detail & Related papers (2020-02-01T01:47:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.