Related papers: Probing Model Signal-Awareness via Prediction-Preserving Input Minimization

Probing Model Signal-Awareness via Prediction-Preserving Input Minimization

URL: http://arxiv.org/abs/2011.14934v2
Date: Tue, 22 Jun 2021 21:44:44 GMT
Title: Probing Model Signal-Awareness via Prediction-Preserving Input Minimization
Authors: Sahil Suneja, Yunhui Zheng, Yufan Zhuang, Jim Laredo, Alessandro Morari
Abstract summary: We evaluate models' ability to capture the correct vulnerability signals to produce their predictions. We measure the signal awareness of models using a new metric we propose- Signal-aware Recall (SAR) The results show a sharp drop in the model's Recall from the high 90s to sub-60s with the new metric.
Score: 67.62847721118142
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work explores the signal awareness of AI models for source code understanding. Using a software vulnerability detection use case, we evaluate the models' ability to capture the correct vulnerability signals to produce their predictions. Our prediction-preserving input minimization (P2IM) approach systematically reduces the original source code to a minimal snippet which a model needs to maintain its prediction. The model's reliance on incorrect signals is then uncovered when the vulnerability in the original code is missing in the minimal snippet, both of which the model however predicts as being vulnerable. We measure the signal awareness of models using a new metric we propose- Signal-aware Recall (SAR). We apply P2IM on three different neural network architectures across multiple datasets. The results show a sharp drop in the model's Recall from the high 90s to sub-60s with the new metric, highlighting that the models are presumably picking up a lot of noise or dataset nuances while learning their vulnerability detection logic. Although the drop in model performance may be perceived as an adversarial attack, but this isn't P2IM's objective. The idea is rather to uncover the signal-awareness of a black-box model in a data-driven manner via controlled queries. SAR's purpose is to measure the impact of task-agnostic model training, and not to suggest a shortcoming in the Recall metric. The expectation, in fact, is for SAR to match Recall in the ideal scenario where the model truly captures task-specific signals.

Related papers

Optimizing Model Splitting and Device Task Assignment for Deceptive Signal Assisted Private Multi-hop Split Learning [58.620753467152376]
In our model, several edge devices jointly perform collaborative training, and some eavesdroppers aim to collect the model and data information from devices.<n>To prevent the eavesdroppers from collecting model and data information, a subset of devices can transmit deceptive signals.<n>We propose a soft actor-critic deep reinforcement learning framework with intrinsic curiosity module and cross-attention.
arXiv Detail & Related papers (2025-07-09T22:53:23Z)
AMUN: Adversarial Machine UNlearning [13.776549741449557]
Adversarial Machine UNlearning (AMUN) outperforms prior state-of-the-art (SOTA) methods for image classification. AMUN lowers the confidence of the model on the forget samples by fine-tuning the model on their corresponding adversarial examples.
arXiv Detail & Related papers (2025-03-02T14:36:31Z)
Realistic Image-to-Image Machine Unlearning via Decoupling and Knowledge Retention [1.795561427808824]
We argue that the machine learning model performs fairly well on unseen data. We propose a framework which decouples the model parameters with gradient ascent. We also provide $(epsilon, delta)$-unlearning guarantee for model updates with gradient ascent.
arXiv Detail & Related papers (2025-02-06T17:46:49Z)
Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal. Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths. Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
Adversarial Robustness Assessment of NeuroEvolution Approaches [1.237556184089774]
We evaluate the robustness of models found by two NeuroEvolution approaches on the CIFAR-10 image classification task. Our results show that when the evolved models are attacked with iterative methods, their accuracy usually drops to, or close to, zero. Some of these techniques can exacerbate the perturbations added to the original inputs, potentially harming robustness.
arXiv Detail & Related papers (2022-07-12T10:40:19Z)
One-Pixel Shortcut: on the Learning Preference of Deep Neural Networks [28.502489028888608]
Unlearnable examples (ULEs) aim to protect data from unauthorized usage for training DNNs. In adversarial training, the unlearnability of error-minimizing noise will severely degrade. We propose a novel model-free method, named emphOne-Pixel Shortcut, which only perturbs a single pixel of each image and makes the dataset unlearnable.
arXiv Detail & Related papers (2022-05-24T15:17:52Z)
Label-only Model Inversion Attack: The Attack that Requires the Least Information [14.061083728194378]
In a model inversion attack, an adversary attempts to reconstruct the data records, used to train a target model, using only the model's output. We have found a model inversion method that can reconstruct the input data records based only on the output labels.
arXiv Detail & Related papers (2022-03-13T03:03:49Z)
Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return. Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions. The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z)
MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI [1.693045612956149]
Deep neural networks deployed in Machine Learning as a Service (ML) face the threat of model extraction attacks. A model extraction attack is an attack to violate intellectual property and privacy in which an adversary steals trained models in a cloud using only their predictions. In this paper, we propose MEGEX, a data-free model extraction attack against a gradient-based explainable AI.
arXiv Detail & Related papers (2021-07-19T14:25:06Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA) Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution. Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z)
Model Extraction Attacks against Recurrent Neural Networks [1.2891210250935146]
We study the threats of model extraction attacks against recurrent neural networks (RNNs) We discuss whether a model with a higher accuracy can be extracted with a simple RNN from a long short-term memory (LSTM) We then show that a model with a higher accuracy can be extracted efficiently, especially through configuring a loss function and a more complex architecture.
arXiv Detail & Related papers (2020-02-01T01:47:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.