Related papers: A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning

A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning

URL: http://arxiv.org/abs/2403.14972v1
Date: Fri, 22 Mar 2024 06:03:07 GMT
Title: A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning
Authors: Changmeng Zheng, Dayong Liang, Wengyu Zhang, Xiao-Yong Wei, Tat-Seng Chua, Qing Li,
Abstract summary: The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. To address the issue, we propose a deductive (top-down) debating approach called Blueprint Debate on Graphs (BDoG) In BDoG, debates are confined to a blueprint graph to prevent opinion trivialization through world-level summarization. Moreover, by storing evidence in branches within the graph, BDoG mitigates distractions caused by frequent but irrelevant concepts.
Score: 53.35861580821777
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a pilot study aimed at introducing multi-agent debate into multimodal reasoning. The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. These challenges stem from the inductive (bottom-up) nature of existing debating schemes. To address the issue, we propose a deductive (top-down) debating approach called Blueprint Debate on Graphs (BDoG). In BDoG, debates are confined to a blueprint graph to prevent opinion trivialization through world-level summarization. Moreover, by storing evidence in branches within the graph, BDoG mitigates distractions caused by frequent but irrelevant concepts. Extensive experiments validate BDoG, achieving state-of-the-art results in Science QA and MMBench with significant improvements over previous methods.

Related papers

A Survey on Latent Reasoning [100.54120559169735]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities.<n>CoT reasoning that verbalizes intermediate steps limits the model's expressive bandwidth.<n>Latent reasoning tackles this bottleneck by performing multi-step inference entirely in the model's continuous hidden state.
arXiv Detail & Related papers (2025-07-08T17:29:07Z)
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought [83.89629325805505]
We introduce Argus to address limitations with a new visual attention grounding mechanism.<n>Our approach employs object-centric grounding as visual chain-of-thought signals, enabling more effective goal-conditioned visual attention.
arXiv Detail & Related papers (2025-05-29T17:59:56Z)
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception [105.78609483419115]
We introduce LongPerceptualThoughts, a new synthetic dataset with 30K long-thought traces for perceptual tasks. We propose a novel three-stage data synthesis framework that first synthesizes verifiable multiple-choice questions. We demonstrate notable improvements over existing visual reasoning data-generation methods.
arXiv Detail & Related papers (2025-04-21T18:10:38Z)
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning [53.790502697674754]
We propose Take-along Visual Conditioning (TVC), a strategy that shifts image input to critical reasoning stages. TVC helps the model retain attention to the visual components throughout the reasoning. Our approach achieves state-of-the-art performance on average across five mathematical reasoning benchmarks.
arXiv Detail & Related papers (2025-03-17T16:45:12Z)
Unveiling the Magic of Code Reasoning through Hypothesis Decomposition and Amendment [54.62926010621013]
We introduce a novel task, code reasoning, to provide a new perspective for the reasoning abilities of large language models. We summarize three meta-benchmarks based on established forms of logical reasoning, and instantiate these into eight specific benchmark tasks. We present a new pathway exploration pipeline inspired by human intricate problem-solving methods.
arXiv Detail & Related papers (2025-02-17T10:39:58Z)
Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs [4.701165676405066]
It is critical not only to retrieve relevant information but also to provide causal reasoning and explainability. This paper proposes a novel pipeline that filters large knowledge graphs to emphasize cause-effect edges. Experiments on medical question-answering tasks show consistent gains, with up to a 10% absolute improvement.
arXiv Detail & Related papers (2025-01-24T19:31:06Z)
Breaking Event Rumor Detection via Stance-Separated Multi-Agent Debate [21.342632695285364]
Leveraging large language models (LLMs) for rumor detection holds significant promise. We propose the Stance Separated Multi-Agent Debate (S2MAD) to address this issue. Our proposed model outperforms state-of-the-art methods in terms of performance.
arXiv Detail & Related papers (2024-12-06T08:52:30Z)
Boosting of Thoughts: Trial-and-Error Problem Solving with Large Language Models [48.43678591317425]
Boosting of Thoughts (BoT) is an automated prompting framework for problem solving with Large Language Models. We show that BoT consistently achieves higher or comparable problem-solving rates than other advanced prompting approaches.
arXiv Detail & Related papers (2024-02-17T00:13:36Z)
DebateKG: Automatic Policy Debate Case Creation with Semantic Knowledge Graphs [0.0]
We show that effective debate cases can be constructed using constrained shortest path traversals on Argumentative Semantic Knowledge Graphs. We significantly improve upon DebateSum by introducing 53180 new examples. We create a unique method for evaluating which knowledge graphs are better in the context of producing policy debate cases.
arXiv Detail & Related papers (2023-07-09T04:19:19Z)
Visual Chain of Thought: Bridging Logical Gaps with Multimodal Infillings [61.04460792203266]
We introduce VCoT, a novel method that leverages chain-of-thought prompting with vision-language grounding to bridge the logical gaps within sequential data. Our method uses visual guidance to generate synthetic multimodal infillings that add consistent and novel information to reduce the logical gaps for downstream tasks.
arXiv Detail & Related papers (2023-05-03T17:58:29Z)
Explaining Image Classification with Visual Debates [26.76139301708958]
We propose a novel debate framework for understanding and explaining a continuous image classifier's reasoning for making a particular prediction. Our framework encourages players to put forward diverse arguments during the debates, picking up the reasoning trails missed by their opponents. We demonstrate and evaluate (a practical realization) our Visual Debates on the geometric SHAPE and MNIST datasets.
arXiv Detail & Related papers (2022-10-17T12:35:52Z)
CLEAR: Generative Counterfactual Explanations on Graphs [60.30009215290265]
We study the problem of counterfactual explanation generation on graphs. A few studies have explored counterfactual explanations on graphs, but many challenges of this problem are still not well-addressed. We propose a novel framework CLEAR which aims to generate counterfactual explanations on graphs for graph-level prediction models.
arXiv Detail & Related papers (2022-10-16T04:35:32Z)
Deep Image Deblurring: A Survey [165.32391279761006]
Deblurring is a classic problem in low-level computer vision, which aims to recover a sharp image from a blurred input image. Recent advances in deep learning have led to significant progress in solving this problem.
arXiv Detail & Related papers (2022-01-26T01:31:30Z)
Jointly Attacking Graph Neural Network and its Explanations [50.231829335996814]
Graph Neural Networks (GNNs) have boosted the performance for many graph-related tasks. Recent studies have shown that GNNs are highly vulnerable to adversarial attacks, where adversaries can mislead the GNNs' prediction by modifying graphs. We propose a novel attack framework (GEAttack) which can attack both a GNN model and its explanations by simultaneously exploiting their vulnerabilities.
arXiv Detail & Related papers (2021-08-07T07:44:33Z)
On Generating Plausible Counterfactual and Semi-Factual Explanations for Deep Learning [15.965337956587373]
PlausIble Exceptionality-based Contrastive Explanations (PIECE), modifies all exceptional features in a test image to be normal from the perspective of the counterfactual class. Two controlled experiments compare PIECE to others in the literature, showing that PIECE not only generates the most plausible counterfactuals on several measures, but also the best semifactuals.
arXiv Detail & Related papers (2020-09-10T14:48:12Z)
Debate Dynamics for Human-comprehensible Fact-checking on Knowledge Graphs [27.225048123690243]
We propose a novel method for fact-checking on knowledge graphs based on debate dynamics. The underlying idea is to frame the task of triple classification as a debate game between two reinforcement learning agents. Our method allows for interactive reasoning on knowledge graphs where the users can raise additional arguments or evaluate the debate taking common sense reasoning and external information into account.
arXiv Detail & Related papers (2020-01-09T15:19:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.