A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning
- URL: http://arxiv.org/abs/2403.14972v1
- Date: Fri, 22 Mar 2024 06:03:07 GMT
- Title: A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning
- Authors: Changmeng Zheng, Dayong Liang, Wengyu Zhang, Xiao-Yong Wei, Tat-Seng Chua, Qing Li,
- Abstract summary: The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images.
To address the issue, we propose a deductive (top-down) debating approach called Blueprint Debate on Graphs (BDoG)
In BDoG, debates are confined to a blueprint graph to prevent opinion trivialization through world-level summarization. Moreover, by storing evidence in branches within the graph, BDoG mitigates distractions caused by frequent but irrelevant concepts.
- Score: 53.35861580821777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a pilot study aimed at introducing multi-agent debate into multimodal reasoning. The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. These challenges stem from the inductive (bottom-up) nature of existing debating schemes. To address the issue, we propose a deductive (top-down) debating approach called Blueprint Debate on Graphs (BDoG). In BDoG, debates are confined to a blueprint graph to prevent opinion trivialization through world-level summarization. Moreover, by storing evidence in branches within the graph, BDoG mitigates distractions caused by frequent but irrelevant concepts. Extensive experiments validate BDoG, achieving state-of-the-art results in Science QA and MMBench with significant improvements over previous methods.
Related papers
- Cantor: Inspiring Multimodal Chain-of-Thought of MLLM [83.6663322930814]
We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks.
We propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture.
Our experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance.
arXiv Detail & Related papers (2024-04-24T17:59:48Z) - Boosting of Thoughts: Trial-and-Error Problem Solving with Large
Language Models [48.43678591317425]
Boosting of Thoughts (BoT) is an automated prompting framework for problem solving with Large Language Models.
We show that BoT consistently achieves higher or comparable problem-solving rates than other advanced prompting approaches.
arXiv Detail & Related papers (2024-02-17T00:13:36Z) - Explainable Topic-Enhanced Argument Mining from Heterogeneous Sources [33.62800469391487]
Given a controversial target such as nuclear energy'', argument mining aims to identify the argumentative text from heterogeneous sources.
Current approaches focus on exploring better ways of integrating the target-associated semantic information with the argumentative text.
We propose a novel explainable topic-enhanced argument mining approach.
arXiv Detail & Related papers (2023-07-22T17:26:55Z) - DebateKG: Automatic Policy Debate Case Creation with Semantic Knowledge
Graphs [0.0]
We show that effective debate cases can be constructed using constrained shortest path traversals on Argumentative Semantic Knowledge Graphs.
We significantly improve upon DebateSum by introducing 53180 new examples.
We create a unique method for evaluating which knowledge graphs are better in the context of producing policy debate cases.
arXiv Detail & Related papers (2023-07-09T04:19:19Z) - Causal Triplet: An Open Challenge for Intervention-centric Causal
Representation Learning [98.78136504619539]
Causal Triplet is a causal representation learning benchmark featuring visually more complex scenes.
We show that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts.
arXiv Detail & Related papers (2023-01-12T17:43:38Z) - Pearl Causal Hierarchy on Image Data: Intricacies & Challenges [17.103787431518683]
Many researchers have voiced their support towards Pearl's counterfactual theory of causation as a stepping stone for AI/ML research's ultimate goal of intelligent systems.
This work exemplifies how the Pearl Causal Hierarchy (PCH) can be understood on image data by providing insights on several intricacies.
arXiv Detail & Related papers (2022-12-23T19:59:28Z) - Explaining Image Classification with Visual Debates [26.76139301708958]
We propose a novel debate framework for understanding and explaining a continuous image classifier's reasoning for making a particular prediction.
Our framework encourages players to put forward diverse arguments during the debates, picking up the reasoning trails missed by their opponents.
We demonstrate and evaluate (a practical realization) our Visual Debates on the geometric SHAPE and MNIST datasets.
arXiv Detail & Related papers (2022-10-17T12:35:52Z) - CLEAR: Generative Counterfactual Explanations on Graphs [60.30009215290265]
We study the problem of counterfactual explanation generation on graphs.
A few studies have explored counterfactual explanations on graphs, but many challenges of this problem are still not well-addressed.
We propose a novel framework CLEAR which aims to generate counterfactual explanations on graphs for graph-level prediction models.
arXiv Detail & Related papers (2022-10-16T04:35:32Z) - Deep Image Deblurring: A Survey [165.32391279761006]
Deblurring is a classic problem in low-level computer vision, which aims to recover a sharp image from a blurred input image.
Recent advances in deep learning have led to significant progress in solving this problem.
arXiv Detail & Related papers (2022-01-26T01:31:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.