Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
- URL: http://arxiv.org/abs/2502.02260v1
- Date: Tue, 04 Feb 2025 12:17:08 GMT
- Title: Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
- Authors: Javier Rando, Jie Zhang, Nicholas Carlini, Florian Tramèr,
- Abstract summary: In the past decade, considerable research effort has been devoted to securing machine learning (ML) models that operate in adversarial settings.
Yet, progress has been slow even for simple "toy" problems.
Today, adversarial ML research has shifted towards studying larger, general-purpose language models.
- Score: 62.306374598571516
- License:
- Abstract: In the past decade, considerable research effort has been devoted to securing machine learning (ML) models that operate in adversarial settings. Yet, progress has been slow even for simple "toy" problems (e.g., robustness to small adversarial perturbations) and is often hindered by non-rigorous evaluations. Today, adversarial ML research has shifted towards studying larger, general-purpose language models. In this position paper, we argue that the situation is now even worse: in the era of LLMs, the field of adversarial ML studies problems that are (1) less clearly defined, (2) harder to solve, and (3) even more challenging to evaluate. As a result, we caution that yet another decade of work on adversarial ML may fail to produce meaningful progress.
Related papers
- Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives [52.863024096759816]
Misaligned research objectives have hindered progress in adversarial robustness research over the past decade.
We argue that realigned objectives are necessary for meaningful progress in adversarial alignment.
arXiv Detail & Related papers (2025-02-17T15:28:40Z) - Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
We develop an adversarial reasoning approach to automatic jailbreaking via test-time computation.
Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
arXiv Detail & Related papers (2025-02-03T18:59:01Z) - Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation [65.92001420372007]
This paper systematically evaluates state-of-the-art Multimodal Large Language Models (MLLMs) across diverse benchmarks.
We show significant performance drops when negation arguments are introduced to initially correct responses.
arXiv Detail & Related papers (2025-01-31T10:37:48Z) - Large Language Models Think Too Fast To Explore Effectively [0.0]
The extent to which Large Language Models can effectively explore, particularly in open-ended tasks, remains unclear.
This study investigates whether LLMs can surpass humans in exploration during an open-ended task, using Little Alchemy 2 as a paradigm.
arXiv Detail & Related papers (2025-01-29T21:51:17Z) - ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection [60.297079601066784]
We introduce ErrorRadar, the first benchmark designed to assess MLLMs' capabilities in error detection.
ErrorRadar evaluates two sub-tasks: error step identification and error categorization.
It consists of 2,500 high-quality multimodal K-12 mathematical problems, collected from real-world student interactions.
Results indicate significant challenges still remain, as GPT-4o with best performance is still around 10% behind human evaluation.
arXiv Detail & Related papers (2024-10-06T14:59:09Z) - Adversarial Math Word Problem Generation [6.92510069380188]
We propose a new paradigm for ensuring fair evaluation of large language models (LLMs)
We generate adversarial examples which preserve the structure and difficulty of the original questions aimed for assessment, but are unsolvable by LLMs.
We conduct experiments on various open- and closed-source LLMs, quantitatively and qualitatively demonstrating that our method significantly degrades their math problem-solving ability.
arXiv Detail & Related papers (2024-02-27T22:07:52Z) - Trustworthy Large Models in Vision: A Survey [8.566163225282724]
Large Models (LMs) have revolutionized various fields of deep learning, including Natural Language Processing (NLP) to Computer Vision (CV)
LMs are increasingly challenged and criticized by academia and industry due to their powerful performance but untrustworthy behavior.
We summarize four relevant concerns that obstruct the trustworthy usage in vision of LMs in this survey, including 1) human misuse, 2) vulnerability, 3) inherent issue and 4) interpretability.
We hope this survey will facilitate readers' understanding of this field, promote alignment of LMs with human expectations and enable trustworthy LMs to serve as welfare rather than disaster for human society.
arXiv Detail & Related papers (2023-11-16T08:49:46Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - "Real Attackers Don't Compute Gradients": Bridging the Gap Between
Adversarial ML Research and Practice [10.814642396601139]
Motivated by the apparent gap between researchers and practitioners, this paper aims to bridge the two domains.
We first present three real-world case studies from which we can glean practical insights unknown or neglected in research.
Next we analyze all adversarial ML papers recently published in top security conferences, highlighting positive trends and blind spots.
arXiv Detail & Related papers (2022-12-29T14:11:07Z) - Interpretable Machine Learning -- A Brief History, State-of-the-Art and
Challenges [0.8029049649310213]
We present a brief history of the field of interpretable machine learning (IML), give an overview of state-of-the-art interpretation methods, and discuss challenges.
As young as the field is, it has over 200 years old roots in regression modeling and rule-based machine learning, starting in the 1960s.
Many new IML methods have been proposed, many of them model-agnostic, but also interpretation techniques specific to deep learning and tree-based ensembles.
arXiv Detail & Related papers (2020-10-19T09:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.