You Don't Need Robust Machine Learning to Manage Adversarial Attack
Risks
- URL: http://arxiv.org/abs/2306.09951v1
- Date: Fri, 16 Jun 2023 16:32:27 GMT
- Title: You Don't Need Robust Machine Learning to Manage Adversarial Attack
Risks
- Authors: Edward Raff, Michel Benaroch, Andrew L. Farris
- Abstract summary: The ability to subvert a machine learning model into making errant predictions is startling.
Current mitigations come with a high cost and simultaneously reduce the model's accuracy.
This is done with an eye toward how one would then mitigate these attacks in practice, the risks for production deployment, and how those risks could be managed.
- Score: 31.111554739533663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The robustness of modern machine learning (ML) models has become an
increasing concern within the community. The ability to subvert a model into
making errant predictions using seemingly inconsequential changes to input is
startling, as is our lack of success in building models robust to this concern.
Existing research shows progress, but current mitigations come with a high cost
and simultaneously reduce the model's accuracy. However, such trade-offs may
not be necessary when other design choices could subvert the risk. In this
survey we review the current literature on attacks and their real-world
occurrences, or limited evidence thereof, to critically evaluate the real-world
risks of adversarial machine learning (AML) for the average entity. This is
done with an eye toward how one would then mitigate these attacks in practice,
the risks for production deployment, and how those risks could be managed. In
doing so we elucidate that many AML threats do not warrant the cost and
trade-offs of robustness due to a low likelihood of attack or availability of
superior non-ML mitigations. Our analysis also recommends cases where an actor
should be concerned about AML to the degree where robust ML models are
necessary for a complete deployment.
Related papers
- SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models [50.34706204154244]
Acquiring reasoning capabilities catastrophically degrades inherited safety alignment.
Certain scenarios suffer 25 times higher attack rates.
Despite tight reasoning-answer safety coupling, MLRMs demonstrate nascent self-correction.
arXiv Detail & Related papers (2025-04-09T06:53:23Z) - Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation [0.7889270818022226]
We show how existing AI benchmarks can be used to facilitate the creation of risk estimates.
We describe the results of a pilot study in which experts use information from Cybench, an AI benchmark, to generate probability estimates.
arXiv Detail & Related papers (2025-03-06T10:39:47Z) - MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks [109.53357276796655]
Multimodal large language models (MLLMs) equipped with Retrieval Augmented Generation (RAG)
RAG enhances MLLMs by grounding responses in query-relevant external knowledge.
This reliance poses a critical yet underexplored safety risk: knowledge poisoning attacks.
We propose MM-PoisonRAG, a novel knowledge poisoning attack framework with two attack strategies.
arXiv Detail & Related papers (2025-02-25T04:23:59Z) - Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities [49.09703018511403]
Evaluations of large language model (LLM) risks and capabilities are increasingly being incorporated into AI risk management and governance frameworks.
Currently, most risk evaluations are conducted by designing inputs that elicit harmful behaviors from the system.
We propose evaluating LLMs with model tampering attacks which allow for modifications to latent activations or weights.
arXiv Detail & Related papers (2025-02-03T18:59:16Z) - On Evaluating the Durability of Safeguards for Open-Weight LLMs [80.36750298080275]
We discuss whether technical safeguards can impede the misuse of large language models (LLMs)
We show that even evaluating these defenses is exceedingly difficult and can easily mislead audiences into thinking that safeguards are more durable than they really are.
We suggest future research carefully cabin claims to more constrained, well-defined, and rigorously examined threat models.
arXiv Detail & Related papers (2024-12-10T01:30:32Z) - Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability [44.99833362998488]
Large Language Models (LLMs) have shown impressive performance across a wide range of tasks.
LLMs in particular are known to be vulnerable to adversarial attacks, where an imperceptible change to the input can mislead the output of the model.
We propose a method, based on Mechanistic Interpretability (MI) techniques, to guide this process.
arXiv Detail & Related papers (2024-07-29T09:55:34Z) - A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends [78.3201480023907]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks.
The vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage.
In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks.
arXiv Detail & Related papers (2024-07-10T06:57:58Z) - Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications [0.0]
Large Language Models (LLMs) have revolutionized various applications by providing advanced natural language processing capabilities.
This paper explores the threat modeling and risk analysis specifically tailored for LLM-powered applications.
arXiv Detail & Related papers (2024-06-16T16:43:58Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Risk and Response in Large Language Models: Evaluating Key Threat Categories [6.436286493151731]
This paper explores the pressing issue of risk assessment in Large Language Models (LLMs)
By utilizing the Anthropic Red-team dataset, we analyze major risk categories, including Information Hazards, Malicious Uses, and Discrimination/Hateful content.
Our findings indicate that LLMs tend to consider Information Hazards less harmful, a finding confirmed by a specially developed regression model.
arXiv Detail & Related papers (2024-03-22T06:46:40Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - Identifying the Risks of LM Agents with an LM-Emulated Sandbox [68.26587052548287]
Language Model (LM) agents and tools enable a rich set of capabilities but also amplify potential risks.
High cost of testing these agents will make it increasingly difficult to find high-stakes, long-tailed risks.
We introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios.
arXiv Detail & Related papers (2023-09-25T17:08:02Z) - Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of
Foundation Models [103.71308117592963]
We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning.
In a small-scale experiment, we show MLAC can largely prevent a BERT-style model from being re-purposed to perform gender identification.
arXiv Detail & Related papers (2022-11-27T21:43:45Z) - Machine Learning Security against Data Poisoning: Are We There Yet? [23.809841593870757]
This article reviews data poisoning attacks that compromise the training data used to learn machine learning models.
We discuss how to mitigate these attacks using basic security principles, or by deploying ML-oriented defensive mechanisms.
arXiv Detail & Related papers (2022-04-12T17:52:09Z) - Adversarial Machine Learning Threat Analysis in Open Radio Access
Networks [37.23982660941893]
The Open Radio Access Network (O-RAN) is a new, open, adaptive, and intelligent RAN architecture.
In this paper, we present a systematic adversarial machine learning threat analysis for the O-RAN.
arXiv Detail & Related papers (2022-01-16T17:01:38Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.