You Don't Need Robust Machine Learning to Manage Adversarial Attack
Risks
- URL: http://arxiv.org/abs/2306.09951v1
- Date: Fri, 16 Jun 2023 16:32:27 GMT
- Title: You Don't Need Robust Machine Learning to Manage Adversarial Attack
Risks
- Authors: Edward Raff, Michel Benaroch, Andrew L. Farris
- Abstract summary: The ability to subvert a machine learning model into making errant predictions is startling.
Current mitigations come with a high cost and simultaneously reduce the model's accuracy.
This is done with an eye toward how one would then mitigate these attacks in practice, the risks for production deployment, and how those risks could be managed.
- Score: 31.111554739533663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The robustness of modern machine learning (ML) models has become an
increasing concern within the community. The ability to subvert a model into
making errant predictions using seemingly inconsequential changes to input is
startling, as is our lack of success in building models robust to this concern.
Existing research shows progress, but current mitigations come with a high cost
and simultaneously reduce the model's accuracy. However, such trade-offs may
not be necessary when other design choices could subvert the risk. In this
survey we review the current literature on attacks and their real-world
occurrences, or limited evidence thereof, to critically evaluate the real-world
risks of adversarial machine learning (AML) for the average entity. This is
done with an eye toward how one would then mitigate these attacks in practice,
the risks for production deployment, and how those risks could be managed. In
doing so we elucidate that many AML threats do not warrant the cost and
trade-offs of robustness due to a low likelihood of attack or availability of
superior non-ML mitigations. Our analysis also recommends cases where an actor
should be concerned about AML to the degree where robust ML models are
necessary for a complete deployment.
Related papers
- Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability [44.99833362998488]
Large Language Models (LLMs) have shown impressive performance across a wide range of tasks.
LLMs in particular are known to be vulnerable to adversarial attacks, where an imperceptible change to the input can mislead the output of the model.
We propose a method, based on Mechanistic Interpretability (MI) techniques, to guide this process.
arXiv Detail & Related papers (2024-07-29T09:55:34Z) - A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends [78.3201480023907]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks.
The vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage.
In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks.
arXiv Detail & Related papers (2024-07-10T06:57:58Z) - Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications [0.0]
Large Language Models (LLMs) have revolutionized various applications by providing advanced natural language processing capabilities.
This paper explores the threat modeling and risk analysis specifically tailored for LLM-powered applications.
arXiv Detail & Related papers (2024-06-16T16:43:58Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Risk and Response in Large Language Models: Evaluating Key Threat Categories [6.436286493151731]
This paper explores the pressing issue of risk assessment in Large Language Models (LLMs)
By utilizing the Anthropic Red-team dataset, we analyze major risk categories, including Information Hazards, Malicious Uses, and Discrimination/Hateful content.
Our findings indicate that LLMs tend to consider Information Hazards less harmful, a finding confirmed by a specially developed regression model.
arXiv Detail & Related papers (2024-03-22T06:46:40Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - Identifying the Risks of LM Agents with an LM-Emulated Sandbox [68.26587052548287]
Language Model (LM) agents and tools enable a rich set of capabilities but also amplify potential risks.
High cost of testing these agents will make it increasingly difficult to find high-stakes, long-tailed risks.
We introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios.
arXiv Detail & Related papers (2023-09-25T17:08:02Z) - Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of
Foundation Models [103.71308117592963]
We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning.
In a small-scale experiment, we show MLAC can largely prevent a BERT-style model from being re-purposed to perform gender identification.
arXiv Detail & Related papers (2022-11-27T21:43:45Z) - Machine Learning Security against Data Poisoning: Are We There Yet? [23.809841593870757]
This article reviews data poisoning attacks that compromise the training data used to learn machine learning models.
We discuss how to mitigate these attacks using basic security principles, or by deploying ML-oriented defensive mechanisms.
arXiv Detail & Related papers (2022-04-12T17:52:09Z) - Adversarial Machine Learning Threat Analysis in Open Radio Access
Networks [37.23982660941893]
The Open Radio Access Network (O-RAN) is a new, open, adaptive, and intelligent RAN architecture.
In this paper, we present a systematic adversarial machine learning threat analysis for the O-RAN.
arXiv Detail & Related papers (2022-01-16T17:01:38Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.