Related papers: Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training

Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training

URL: http://arxiv.org/abs/2311.14109v2
Date: Wed, 3 Jul 2024 02:56:47 GMT
Title: Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Authors: Cheng Tan, Jingxuan Wei, Zhangyang Gao, Linzhuang Sun, Siyuan Li, Ruifeng Guo, Bihui Yu, Stan Z. Li,
Abstract summary: Multimodal reasoning is a challenging task that requires models to reason across multiple modalities to answer questions. Existing approaches have made progress by incorporating language and visual modalities into a two-stage reasoning framework. We propose MC-CoT, a self-consistency training strategy that generates multiple rationales and answers, subsequently selecting the most accurate through a voting process.
Score: 49.3242278912771
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal reasoning is a challenging task that requires models to reason across multiple modalities to answer questions. Existing approaches have made progress by incorporating language and visual modalities into a two-stage reasoning framework, separating rationale generation from answer inference. However, these approaches often fall short due to the inadequate quality of the generated rationales. In this work, we delve into the importance of rationales in model reasoning. We observe that when rationales are completely accurate, the model's accuracy significantly improves, highlighting the need for high-quality rationale generation. Motivated by this, we propose MC-CoT, a self-consistency training strategy that generates multiple rationales and answers, subsequently selecting the most accurate through a voting process. This approach not only enhances the quality of generated rationales but also leads to more accurate and robust answers. Through extensive experiments, we demonstrate that our approach significantly improves model performance across various benchmarks. Remarkably, we show that even smaller base models, when equipped with our proposed approach, can achieve results comparable to those of larger models, illustrating the potential of our approach in harnessing the power of rationales for improved multimodal reasoning. The code is available at https://github.com/chengtan9907/mc-cot.

Related papers

Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models [49.598776427454176]
Large Reasoning Models (LRMs) have gradually become a research hotspot due to their outstanding performance in handling complex tasks.<n>However, with the widespread application of these models, the problem of overthinking has gradually emerged.<n>Various efficient reasoning methods have been proposed, aiming to reduce the length of reasoning paths without compromising model performance and reasoning capability.
arXiv Detail & Related papers (2025-08-04T06:54:31Z)
Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
Large reasoning models (LRMs) exhibit overthinking, which hinders efficiency and inflates inference cost.<n>We propose two lightweight methods to enhance LRM efficiency.<n>First, we introduce Efficiency Steering, a training-free activation steering technique that modulates reasoning behavior via a single direction.<n>Second, we develop Self-Rewarded Efficiency RL, a reinforcement learning framework that dynamically balances task accuracy and brevity.
arXiv Detail & Related papers (2025-06-18T17:18:12Z)
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning [45.16917994431658]
This paper proposes UnifiedReward-Think, the first unified multimodal CoT-based reward model.<n>We first use a small amount of image generation preference data to distill the reasoning process of GPT-4o.<n>We then prepare large-scale unified multimodal preference data to elicit the model's reasoning process across various vision tasks.
arXiv Detail & Related papers (2025-05-06T08:46:41Z)
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods [39.89239733570008]
This work conducts a comprehensive analysis of inference-time scaling methods for both reasoning and non-reasoning models. We find that non-reasoning models, even with an extremely high inference budget, still fall substantially behind reasoning models. For reasoning models, majority voting proves to be a robust inference strategy, generally competitive or outperforming other more sophisticated ITC methods.
arXiv Detail & Related papers (2025-04-18T19:32:55Z)
Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability [16.441081996257576]
We propose leveraging reasoning-intensive models to improve less computationally demanding, non-reasoning models. We demonstrate consistent improvements across various benchmarks, underscoring the potential of this approach for advancing the ability of models to answer questions directly.
arXiv Detail & Related papers (2025-04-13T16:26:56Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage. Models may behave unreliably due to poorly explored failure modes. causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model. We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z)
On the Reasoning Capacity of AI Models and How to Quantify It [0.0]
Large Language Models (LLMs) have intensified the debate surrounding the fundamental nature of their reasoning capabilities. While achieving high performance on benchmarks such as GPQA and MMLU, these models exhibit limitations in more complex reasoning tasks. We propose a novel phenomenological approach that goes beyond traditional accuracy metrics to probe the underlying mechanisms of model behavior.
arXiv Detail & Related papers (2025-01-23T16:58:18Z)
Diving into Self-Evolving Training for Multimodal Reasoning [36.70979791148913]
Self-evolving trainin has emerged as a key approach for complex reasoning tasks.<n>This paper reframes self-evolving training for multimodal reasoning through the lens of reinforcement learning.<n>We propose M-STAR, a framework that achieves consistent performance gains across models of varying sizes and diverse benchmarks.
arXiv Detail & Related papers (2024-12-23T10:18:41Z)
Progressive Multimodal Reasoning via Active Retrieval [64.74746997923967]
Multi-step multimodal reasoning tasks pose significant challenges for large language models (MLLMs) We propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs. We show that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
arXiv Detail & Related papers (2024-12-19T13:25:39Z)
Vision-Language Models Can Self-Improve Reasoning via Reflection [20.196406628954303]
Chain-of-thought (CoT) has proven to improve the reasoning capability of large language models (LLMs) We propose a self-training framework, R3V, which iteratively enhances the model's Vision-language Reasoning by Reflecting on CoT Rationales. Our approach supports self-reflection on generated solutions, further boosting performance through test-time computation.
arXiv Detail & Related papers (2024-10-30T14:45:00Z)
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models [63.36637269634553]
We present a novel method of further improving performance by requiring models to compare multiple reasoning chains. We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, language models.
arXiv Detail & Related papers (2024-07-03T15:01:18Z)
Brainstorming Brings Power to Large Language Models of Knowledge Reasoning [17.14501985068287]
Large Language Models (LLMs) have demonstrated amazing capabilities in language generation, text comprehension, and knowledge reasoning. Recent studies have further improved the model's reasoning ability on a wide range of tasks by introducing multi-model collaboration. We propose the multi-model brainstorming based on prompt. It incorporates different models into a group for brainstorming, and after multiple rounds of reasoning elaboration and re-inference, a consensus answer is reached.
arXiv Detail & Related papers (2024-06-02T14:47:14Z)
Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning [49.3242278912771]
We introduce a novel multimodal RAG framework named RMR (Retrieval Meets Reasoning) The RMR framework employs a bi-modal retrieval module to identify the most relevant question-answer pairs. It significantly boosts the performance of various vision-language models across a spectrum of benchmark datasets.
arXiv Detail & Related papers (2024-05-31T14:23:49Z)
Improving Language Model Reasoning with Self-motivated Learning [60.779625789039486]
textitSelf-motivated Learning framework motivates the model itself to automatically generate rationales on existing datasets. We train a reward model with the rank to evaluate the quality of rationales, and improve the performance of reasoning through reinforcement learning.
arXiv Detail & Related papers (2024-04-10T14:05:44Z)
Conceptual and Unbiased Reasoning in Language Models [98.90677711523645]
We propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions. We show that existing large language models fall short on conceptual reasoning, dropping 9% to 28% on various benchmarks. We then discuss how models can improve since high-level abstract reasoning is key to unbiased and generalizable decision-making.
arXiv Detail & Related papers (2024-03-30T00:53:53Z)
How Ambiguous are the Rationales for Natural Language Reasoning? A Simple Approach to Handling Rationale Uncertainty [0.0]
Rationales behind answers not only explain model decisions but boost language models to reason well on complex reasoning tasks. It is non-trivial to estimate the degree to which the rationales are faithful enough to encourage model performance. We propose how to deal with imperfect rationales causing aleatoric uncertainty.
arXiv Detail & Related papers (2024-02-22T07:12:34Z)
Reasoning Circuits: Few-shot Multihop Question Generation with Structured Rationales [11.068901022944015]
Chain-of-thought rationale generation has been shown to improve performance on multi-step reasoning tasks. We introduce a new framework for applying chain-of-thought inspired structured rationale generation to multi-hop question generation under a very low supervision regime.
arXiv Detail & Related papers (2022-11-15T19:36:06Z)
Why do you think that? Exploring Faithful Sentence-Level Rationales Without Supervision [60.62434362997016]
We propose a differentiable training-framework to create models which output faithful rationales on a sentence level. Our model solves the task based on each rationale individually and learns to assign high scores to those which solved the task best.
arXiv Detail & Related papers (2020-10-07T12:54:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.