Related papers: Selecting Models based on the Risk of Damage Caused by Adversarial Attacks

Selecting Models based on the Risk of Damage Caused by Adversarial Attacks

URL: http://arxiv.org/abs/2301.12151v1
Date: Sat, 28 Jan 2023 10:24:38 GMT
Title: Selecting Models based on the Risk of Damage Caused by Adversarial Attacks
Authors: Jona Klemenc, Holger Trittenbach
Abstract summary: Regulation, legal liabilities, and societal concerns challenge the adoption of AI in safety and security-critical applications. One of the key concerns is that adversaries can cause harm by manipulating model predictions without being detected. We propose a method to model and statistically estimate the probability of damage arising from adversarial attacks.
Score: 2.969705152497174
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Regulation, legal liabilities, and societal concerns challenge the adoption of AI in safety and security-critical applications. One of the key concerns is that adversaries can cause harm by manipulating model predictions without being detected. Regulation hence demands an assessment of the risk of damage caused by adversaries. Yet, there is no method to translate this high-level demand into actionable metrics that quantify the risk of damage. In this article, we propose a method to model and statistically estimate the probability of damage arising from adversarial attacks. We show that our proposed estimator is statistically consistent and unbiased. In experiments, we demonstrate that the estimation results of our method have a clear and actionable interpretation and outperform conventional metrics. We then show how operators can use the estimation results to reliably select the model with the lowest risk.

Related papers

Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation [0.7889270818022226]
We show how existing AI benchmarks can be used to facilitate the creation of risk estimates. We describe the results of a pilot study in which experts use information from Cybench, an AI benchmark, to generate probability estimates.
arXiv Detail & Related papers (2025-03-06T10:39:47Z)
Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences [56.23412698865433]
We focus on causal inferences on a target experiment with unlabeled factual outcomes, retrieved by a predictive model fine-tuned on a labeled similar experiment. First, we show that factual outcome estimation via Empirical Risk Minimization (ERM) may fail to yield valid causal inferences on the target population. We propose Deconfounded Empirical Risk Minimization (DERM), a new simple learning procedure minimizing the risk over a fictitious target population.
arXiv Detail & Related papers (2025-02-10T10:52:17Z)
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities [49.09703018511403]
Evaluations of large language model (LLM) risks and capabilities are increasingly being incorporated into AI risk management and governance frameworks. Currently, most risk evaluations are conducted by designing inputs that elicit harmful behaviors from the system. We propose evaluating LLMs with model tampering attacks which allow for modifications to latent activations or weights.
arXiv Detail & Related papers (2025-02-03T18:59:16Z)
Data-driven decision-making under uncertainty with entropic risk measure [5.407319151576265]
The entropic risk measure is widely used in high-stakes decision making to account for tail risks associated with an uncertain loss. To debias the empirical entropic risk estimator, we propose a strongly consistent bootstrapping procedure. We show that cross validation methods can result in significantly higher out-of-sample risk for the insurer if the bias in validation performance is not corrected for.
arXiv Detail & Related papers (2024-09-30T04:02:52Z)
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework [77.45983464131977]
We focus on how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications. Our research identifies two critical latent factors affecting RAG's confidence in its predictions. We develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers.
arXiv Detail & Related papers (2024-09-24T14:52:14Z)
Data-Adaptive Tradeoffs among Multiple Risks in Distribution-Free Prediction [55.77015419028725]
We develop methods that permit valid control of risk when threshold and tradeoff parameters are chosen adaptively. Our methodology supports monotone and nearly-monotone risks, but otherwise makes no distributional assumptions.
arXiv Detail & Related papers (2024-03-28T17:28:06Z)
Predictive Uncertainty Quantification via Risk Decompositions for Strictly Proper Scoring Rules [7.0549244915538765]
Uncertainty in predictive modeling often relies on ad hoc methods. This paper introduces a theoretical approach to understanding uncertainty through statistical risks. We show how to split pointwise risk into Bayes risk and excess risk.
arXiv Detail & Related papers (2024-02-16T14:40:22Z)
On the Impact of Uncertainty and Calibration on Likelihood-Ratio Membership Inference Attacks [42.18575921329484]
We analyze the performance of the state-of-the-art likelihood ratio attack (LiRA) within an information-theoretical framework. We derive bounds on the advantage of an MIA adversary with the aim of offering insights into the impact of uncertainty and calibration on the effectiveness of MIAs.
arXiv Detail & Related papers (2024-02-16T13:41:18Z)
Adversarial Attacks Against Uncertainty Quantification [10.655660123083607]
This work focuses on a different adversarial scenario in which the attacker is still interested in manipulating the uncertainty estimate. In particular, the goal is to undermine the use of machine-learning models when their outputs are consumed by a downstream module or by a human operator.
arXiv Detail & Related papers (2023-09-19T12:54:09Z)
Safe Deployment for Counterfactual Learning to Rank with Exposure-Based Risk Minimization [63.93275508300137]
We introduce a novel risk-aware Counterfactual Learning To Rank method with theoretical guarantees for safe deployment. Our experimental results demonstrate the efficacy of our proposed method, which is effective at avoiding initial periods of bad performance when little data is available.
arXiv Detail & Related papers (2023-04-26T15:54:23Z)
Balancing detectability and performance of attacks on the control channel of Markov Decision Processes [77.66954176188426]
We investigate the problem of designing optimal stealthy poisoning attacks on the control channel of Markov decision processes (MDPs) This research is motivated by the recent interest of the research community for adversarial and poisoning attacks applied to MDPs, and reinforcement learning (RL) methods.
arXiv Detail & Related papers (2021-09-15T09:13:10Z)
DEUP: Direct Epistemic Uncertainty Prediction [56.087230230128185]
Epistemic uncertainty is part of out-of-sample prediction error due to the lack of knowledge of the learner. We propose a principled approach for directly estimating epistemic uncertainty by learning to predict generalization error and subtracting an estimate of aleatoric uncertainty.
arXiv Detail & Related papers (2021-02-16T23:50:35Z)
Trust but Verify: Assigning Prediction Credibility by Counterfactual Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning. These measures should account for the wide variety of models used in practice. The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z)
Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models [41.53326337725239]
We introduce a practical approach for integrating uncertainty estimation into a class of state-of-the-art neural network methods. We show that our methods enable us to deal gracefully with situations of "no-overlap", common in high-dimensional data. We show that correctly modeling uncertainty can keep us from giving overconfident and potentially harmful recommendations.
arXiv Detail & Related papers (2020-07-01T00:37:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.