Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning
to boost Foundation Modals
- URL: http://arxiv.org/abs/2308.06207v1
- Date: Fri, 11 Aug 2023 16:13:04 GMT
- Title: Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning
to boost Foundation Modals
- Authors: Fanglong Yao, Changyuan Tian, Jintao Liu, Zequn Zhang, Qing Liu, Li
Jin, Shuchao Li, Xiaoyu Li, Xian Sun
- Abstract summary: Chain-of-Thought (CoT) technique is widely regarded as one of the effective methods for enhancing the reasoning ability of foundation models.
This paper proposes a multimodal Hypergraph-of-Thought (HoT) reasoning paradigm, which enables the foundation models to possess the expert-level ability of high-order multi-hop reasoning.
- Score: 15.372421458422489
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reasoning ability is one of the most crucial capabilities of a foundation
model, signifying its capacity to address complex reasoning tasks.
Chain-of-Thought (CoT) technique is widely regarded as one of the effective
methods for enhancing the reasoning ability of foundation models and has
garnered significant attention. However, the reasoning process of CoT is
linear, step-by-step, similar to personal logical reasoning, suitable for
solving general and slightly complicated problems. On the contrary, the
thinking pattern of an expert owns two prominent characteristics that cannot be
handled appropriately in CoT, i.e., high-order multi-hop reasoning and
multimodal comparative judgement. Therefore, the core motivation of this paper
is transcending CoT to construct a reasoning paradigm that can think like an
expert. The hyperedge of a hypergraph could connect various vertices, making it
naturally suitable for modelling high-order relationships. Inspired by this,
this paper innovatively proposes a multimodal Hypergraph-of-Thought (HoT)
reasoning paradigm, which enables the foundation models to possess the
expert-level ability of high-order multi-hop reasoning and multimodal
comparative judgement. Specifically, a textual hypergraph-of-thought is
constructed utilizing triple as the primary thought to model higher-order
relationships, and a hyperedge-of-thought is generated through multi-hop
walking paths to achieve multi-hop inference. Furthermore, we devise a visual
hypergraph-of-thought to interact with the textual hypergraph-of-thought via
Cross-modal Co-Attention Graph Learning for multimodal comparative
verification. Experimentations on the ScienceQA benchmark demonstrate the
proposed HoT-based T5 outperforms CoT-based GPT3.5 and chatGPT, which is on par
with CoT-based GPT4 with a lower model size.
Related papers
- AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning [70.95645743670062]
AtomThink is a framework for constructing long chains of thought (CoT) in a step-by-step manner, guiding MLLMs to perform complex reasoning.
AtomMATH is a large-scale multimodal dataset of long CoTs, and an atomic capability evaluation metric for mathematical tasks.
AtomThink significantly improves the performance of baseline MLLMs, achieving approximately 50% relative accuracy gains on MathVista and 120% on MathVerse.
arXiv Detail & Related papers (2024-11-18T11:54:58Z) - Markov Chain of Thought for Efficient Mathematical Reasoning [10.678633785012691]
Chain of Thought (CoT) of multi-step benefits from the logical structure of the reasoning steps and task-specific actions.
We conceptualize the standard multi-step CoT as a novel Markov Chain of Thought (MCoT)
arXiv Detail & Related papers (2024-10-23T07:53:29Z) - Supervised Chain of Thought [5.389461633686935]
Chain of Thought (CoT) prompting offers a promising approach to solving complex reasoning tasks.
One-prompt-for-all approach poses significant challenges for models to generate the correct reasoning steps.
We show how task-specific supervision is essential for navigating the prompt space accurately and achieving optimal performance.
arXiv Detail & Related papers (2024-10-18T06:25:27Z) - Cantor: Inspiring Multimodal Chain-of-Thought of MLLM [83.6663322930814]
We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks.
We propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture.
Our experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance.
arXiv Detail & Related papers (2024-04-24T17:59:48Z) - Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training [49.3242278912771]
Multimodal reasoning is a challenging task that requires models to reason across multiple modalities to answer questions.
Existing approaches have made progress by incorporating language and visual modalities into a two-stage reasoning framework.
We propose MC-CoT, a self-consistency training strategy that generates multiple rationales and answers, subsequently selecting the most accurate through a voting process.
arXiv Detail & Related papers (2023-11-23T17:09:48Z) - Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models [74.40196814292426]
We propose Graph-of-Thought (GoT) reasoning, which models human thought processes not only as a chain but also as a graph.
GoT captures the non-sequential nature of human thinking and allows for a more realistic modeling of thought processes.
We evaluate GoT's performance on a text-only reasoning task and a multimodal reasoning task.
arXiv Detail & Related papers (2023-05-26T02:15:09Z) - Visual Chain of Thought: Bridging Logical Gaps with Multimodal
Infillings [61.04460792203266]
We introduce VCoT, a novel method that leverages chain-of-thought prompting with vision-language grounding to bridge the logical gaps within sequential data.
Our method uses visual guidance to generate synthetic multimodal infillings that add consistent and novel information to reduce the logical gaps for downstream tasks.
arXiv Detail & Related papers (2023-05-03T17:58:29Z) - Multimodal Chain-of-Thought Reasoning in Language Models [94.70184390935661]
We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework.
Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach.
arXiv Detail & Related papers (2023-02-02T07:51:19Z) - Multimodal Analogical Reasoning over Knowledge Graphs [43.76819868795101]
We introduce the new task of multimodal analogical reasoning over knowledge graphs.
Specifically, we construct a Multimodal Analogical Reasoning dataSet (MARS) and a multimodal knowledge graph MarKG.
We propose a novel model-agnostic Multimodal analogical reasoning framework with Transformer (MarT) motivated by the structure mapping theory.
arXiv Detail & Related papers (2022-10-01T16:24:15Z) - Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango [11.344587937052697]
This work initiates the preliminary steps towards a deeper understanding of reasoning mechanisms in large language models.
Our work centers around querying the model while controlling for all but one of the components in a prompt: symbols, patterns, and text.
We posit that text imbues patterns with commonsense knowledge and meaning.
arXiv Detail & Related papers (2022-09-16T02:54:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.