Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image
- URL: http://arxiv.org/abs/2402.14899v3
- Date: Sun, 22 Sep 2024 14:46:30 GMT
- Title: Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image
- Authors: Zefeng Wang, Zhen Han, Shuo Chen, Fan Xue, Zifeng Ding, Xun Xiao, Volker Tresp, Philip Torr, Jindong Gu,
- Abstract summary: Chain-of-Thought (CoT) reasoning has been widely explored to achieve better reasoning with MLLMs.
Recent studies show that MLLMs still suffer from adversarial images.
We propose a novel attack method, termed as stop-reasoning attack, that attacks the model while bypassing the CoT reasoning process.
- Score: 40.01901770193044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal LLMs (MLLMs) with a great ability of text and image understanding have received great attention. To achieve better reasoning with MLLMs, Chain-of-Thought (CoT) reasoning has been widely explored, which further promotes MLLMs' explainability by giving intermediate reasoning steps. Despite the strong power demonstrated by MLLMs in multimodal reasoning, recent studies show that MLLMs still suffer from adversarial images. This raises the following open questions: Does CoT also enhance the adversarial robustness of MLLMs? What do the intermediate reasoning steps of CoT entail under adversarial attacks? To answer these questions, we first generalize existing attacks to CoT-based inferences by attacking the two main components, i.e., rationale and answer. We find that CoT indeed improves MLLMs' adversarial robustness against the existing attack methods by leveraging the multi-step reasoning process, but not substantially. Based on our findings, we further propose a novel attack method, termed as stop-reasoning attack, that attacks the model while bypassing the CoT reasoning process. Experiments on three MLLMs and two visual reasoning datasets verify the effectiveness of our proposed method. We show that stop-reasoning attack can result in misled predictions and outperform baseline attacks by a significant margin.
Related papers
- Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation [1.0012740151280692]
This paper introduces a framework for evaluating the tri-modal safety of Multimodal Large Language Models (MLLMs)<n>We present the Short-Video Multimodal Adversarial dataset, comprising diverse short-form videos with human-guided synthetic adversarial attacks.<n> Extensive experiments on state-of-the-art MLLMs reveal significant vulnerabilities with high Attack Success Rates (ASR)
arXiv Detail & Related papers (2025-07-16T07:02:15Z) - Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning [51.867949053263466]
We present Corvid, an MLLM with enhanced chain-of-thought (CoT) reasoning capabilities.<n>To enhance Corvid's CoT reasoning capabilities, we introduce MCoT-Instruct-287K, a high-quality multimodal CoT instruction-following dataset.<n>We propose an effective inference-time scaling strategy that enables Corvid to mitigate over-reasoning and under-reasoning.
arXiv Detail & Related papers (2025-07-10T04:31:56Z) - APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization [43.30674910774084]
Multimodal Large Language Models (MLLMs) are powerful at integrating diverse data, but they often struggle with complex reasoning.<n>Our work investigates how the KL penalty and overthinking affect RL training in MLLMs.<n>For positive samples, Difficulty-Adaptive Divergence Shaping (DADS) is introduced to dynamically adjust the KL divergence weight based on their difficulty.<n>For negative samples, Suboptimal Trajectory Complexity Regularization (STCR) is proposed to penalize overly long responses.
arXiv Detail & Related papers (2025-06-26T17:57:08Z) - Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities [76.9327488986162]
Existing attacks against multimodal language models (MLLMs) primarily communicate instructions through text accompanied by adversarial images.<n>We exploit the capabilities of MLLMs to interpret non-textual instructions, specifically, adversarial images or audio generated by our novel method, Con Instruction.<n>Our method achieves the highest attack success rates, reaching 81.3% and 86.6% on LLaVA-v1.5 (13B)
arXiv Detail & Related papers (2025-05-31T13:11:14Z) - Misaligning Reasoning with Answers -- A Framework for Assessing LLM CoT Robustness [3.9930400744726273]
We design a novel evaluation framework, MATCHA, to investigate the relationship between answer and reasoning.<n>In domains like education and healthcare, reasoning is key for model trustworthiness.<n>Our results show that LLMs exhibit greater vulnerability to input perturbations for multi-step and commonsense tasks than compared to logical tasks.
arXiv Detail & Related papers (2025-05-23T02:42:16Z) - Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition [11.422434149376478]
Large Language Models (LLMs) have been touted as AI models possessing advanced reasoning abilities.
In theory, autoregressive LLMs with Chain-of-Thought (CoT) can perform more serial computations to solve complex reasoning tasks.
Recent studies suggest that, despite this capacity, LLMs do not truly learn to reason but instead fit on statistical features.
arXiv Detail & Related papers (2025-04-04T20:57:36Z) - Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning [53.790502697674754]
We propose Take-along Visual Conditioning (TVC), a strategy that shifts image input to critical reasoning stages.
TVC helps the model retain attention to the visual components throughout the reasoning.
Our approach achieves state-of-the-art performance on average across five mathematical reasoning benchmarks.
arXiv Detail & Related papers (2025-03-17T16:45:12Z) - Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning [3.0449420665138485]
Large Language Models (LLMs) have raised interest in their formal reasoning capabilities, particularly in mathematics.
We propose a post-training approach leveraging a mixture of opinions (MoO) from weaker ancillary LLMs to enhance a (relatively) stronger LLM's reasoning.
Our results show that incorporating weaker LLMs' opinions improves mathematical reasoning by an average of 5%, highlighting the value of diverse perspectives in reasoning tasks.
arXiv Detail & Related papers (2025-02-26T23:22:02Z) - SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs [48.28847964704554]
Chain-of-Thought (CoT) reasoning enables Large Language Models (LLMs) to solve complex reasoning tasks.
We propose a novel approach for continuous-space reasoning that does not require modifying the underlying LLM.
arXiv Detail & Related papers (2025-02-17T18:52:29Z) - Can Multimodal Large Language Model Think Analogically? [9.517193263050228]
Multimodal Large Language Model (MLLM) has recently sparked considerable discussion due to its emergent capabilities.
We explore two facets: textitMLLM as an explainer and textitMLLM as a predictor
We propose a unified prompt template and a method for harnessing the comprehension capabilities of MLLM to augment existing models.
arXiv Detail & Related papers (2024-11-02T16:59:49Z) - Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context [49.13497493053742]
This research explores converting a nonsensical suffix attack into a sensible prompt via a situation-driven contextual re-writing.
We combine an independent, meaningful adversarial insertion and situations derived from movies to check if this can trick an LLM.
Our approach demonstrates that a successful situation-driven attack can be executed on both open-source and proprietary LLMs.
arXiv Detail & Related papers (2024-07-19T19:47:26Z) - Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [68.83624133567213]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question.
We also propose a simple yet effective method, Active Deduction (AD), to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z) - Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective [9.633811630889237]
We propose a causal framework to interpret the biases in Visual Question Answering (VQA) problems.
We introduce a novel dataset with 12,000 challenging VQA instances requiring multi-hop reasoning.
Our experiments show that MLLMs perform poorly on MORE, indicating strong unimodal biases and limited semantic understanding.
arXiv Detail & Related papers (2024-03-27T08:38:49Z) - Large Language Models as an Indirect Reasoner: Contrapositive and
Contradiction for Automated Reasoning [79.37150041259066]
This paper proposes a novel Indirect Reasoning (IR) method that employs the logic of contrapositives and contradictions to tackle IR tasks such as factual reasoning and mathematic proof.
The experimental results on popular LLMs, such as GPT-3.5-turbo and Gemini-pro, show that our IR method enhances the overall accuracy of factual reasoning by 27.33% and mathematical proof by 31.43%.
arXiv Detail & Related papers (2024-02-06T03:41:12Z) - MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance [36.03512474289962]
This paper investigates the novel challenge of defending MLLMs against malicious attacks through visual inputs.
Images act as a foreign language" that is not considered during safety alignment, making MLLMs more prone to producing harmful responses.
We introduce MLLM-Protector, a plug-and-play strategy that solves two subtasks: 1) identifying harmful responses via a lightweight harm detector, and 2) transforming harmful responses into harmless ones via a detoxifier.
arXiv Detail & Related papers (2024-01-05T17:05:42Z) - SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks [99.23352758320945]
We propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on large language models (LLMs)
Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs.
arXiv Detail & Related papers (2023-10-05T17:01:53Z) - MathAttack: Attacking Large Language Models Towards Math Solving Ability [29.887497854000276]
We propose a MathAttack model to attack MWP samples which are closer to the essence of security in solving math problems.
It is essential to preserve the mathematical logic of original MWPs during the attacking.
Extensive experiments on our RobustMath and two another math benchmark GSM8K and MultiAirth datasets show that MathAttack could effectively attack the math solving ability of LLMs.
arXiv Detail & Related papers (2023-09-04T16:02:23Z) - Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution.
Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.