KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
- URL: http://arxiv.org/abs/2401.12863v1
- Date: Tue, 23 Jan 2024 15:56:11 GMT
- Title: KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
- Authors: Debjyoti Mondal, Suraj Modi, Subhadarshi Panda, Rituraj Singh,
Godawari Sudhakar Rao
- Abstract summary: We propose a framework that integrates CoT reasoning, Knowledge Graphs, and multiple modalities for a comprehensive understanding of multimodal tasks.
KAM-CoT adopts a two-stage training process with KG grounding to generate effective rationales and answers.
We achieve an average accuracy of 93.87%, surpassing GPT-3.5 (75.17%) by 18% and GPT-4 (83.99%) by 10%.
- Score: 3.103778949672541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated impressive performance in
natural language processing tasks by leveraging chain of thought (CoT) that
enables step-by-step thinking. Extending LLMs with multimodal capabilities is
the recent interest, but incurs computational cost and requires substantial
hardware resources. To address these challenges, we propose KAM-CoT a framework
that integrates CoT reasoning, Knowledge Graphs (KGs), and multiple modalities
for a comprehensive understanding of multimodal tasks. KAM-CoT adopts a
two-stage training process with KG grounding to generate effective rationales
and answers. By incorporating external knowledge from KGs during reasoning, the
model gains a deeper contextual understanding reducing hallucinations and
enhancing the quality of answers. This knowledge-augmented CoT reasoning
empowers the model to handle questions requiring external context, providing
more informed answers. Experimental findings show KAM-CoT outperforms the
state-of-the-art methods. On the ScienceQA dataset, we achieve an average
accuracy of 93.87%, surpassing GPT-3.5 (75.17%) by 18% and GPT-4 (83.99%) by
10%. Remarkably, KAM-CoT achieves these results with only 280M trainable
parameters at a time, demonstrating its cost-efficiency and effectiveness.
Related papers
- MuCo-KGC: Multi-Context-Aware Knowledge Graph Completion [0.0]
Multi-Context-Aware Knowledge Graph Completion (MuCo-KGC) is a novel model that utilizes contextual information from linked entities and relations within the graph to predict tail entities.
MuCo-KGC eliminates the need for entity descriptions and negative triplet sampling, significantly reducing computational complexity while enhancing performance.
Our experiments on standard datasets, including FB15k-237, WN18RR, CoDEx-S, and CoDEx-M, demonstrate that MuCo-KGC outperforms state-of-the-art methods on three datasets.
arXiv Detail & Related papers (2025-03-05T01:18:11Z) - TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action [103.5952731807559]
We present TACO, a family of multi-modal large action models designed to improve performance on complex, multi-step, and multi-modal tasks.
During inference, TACO produces chains-of-thought-and-action (CoTA), executes intermediate steps by invoking external tools such as OCR, depth estimation and calculator.
This dataset enables TACO to learn complex reasoning and action paths, surpassing existing models trained on instruction tuning data with only direct answers.
arXiv Detail & Related papers (2024-12-07T00:42:04Z) - MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks.
Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales.
We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z) - GIVE: Structured Reasoning with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning framework that integrates the parametric and non-parametric memories.
Our method facilitates a more logical and step-wise reasoning approach akin to experts' problem-solving, rather than gold answer retrieval.
arXiv Detail & Related papers (2024-10-11T03:05:06Z) - EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs [11.323661062578799]
EffiQA consists of three stages: global planning, efficient KG exploration, and self-reflection.
Empirical evidence on multiple KBQA benchmarks shows EffiQA's effectiveness.
We hope the proposed new framework will pave the way for efficient, knowledge-intensive querying.
arXiv Detail & Related papers (2024-06-03T11:56:07Z) - MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning [110.80663974060624]
Key-Point-Driven Data Synthesis (KPDDS) is a novel data synthesis framework that synthesizes question-answer pairs.
KPDDS ensures the generation of novel questions with rigorous quality control and substantial scalability.
We present KPMath, an extensive synthetic dataset tailored for mathematical reasoning, comprising over 800K question-answer pairs.
arXiv Detail & Related papers (2024-03-04T18:58:30Z) - Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs [4.092862870428798]
We evaluate the conversational reasoning capabilities of the current state-of-the-art large language model (GPT-4) on knowledge graphs (KGs)
We introduce LLM-ARK, a grounded KG reasoning agent designed to deliver precise and adaptable predictions on KG paths.
LLaMA-2-7B-ARK outperforms the current state-of-the-art model by 5.28 percentage points, with a performance rate of 36.39% on the target@1 evaluation metric.
arXiv Detail & Related papers (2023-12-18T15:23:06Z) - MCC-KD: Multi-CoT Consistent Knowledge Distillation [39.327560600207626]
We propose Multi-CoT Consistent Knowledge Distillation (MCC-KD) to efficiently distill the reasoning capabilities.
In MCC-KD, we generate multiple rationales for each question and enforce consistency among the corresponding predictions.
We investigate the effectiveness of MCC-KD with different model architectures and various model scales.
arXiv Detail & Related papers (2023-10-23T09:32:53Z) - Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for
Knowledge-intensive Question Answering [17.672572064705445]
Large language models (LLMs) equipped with Chain-of-Thought (CoT) have shown impressive reasoning ability in various downstream tasks.
We propose a framework called Knowledge-Driven Chain-of-Thought (KD-CoT) to verify and modify reasoning traces in CoT via interaction with external knowledge.
arXiv Detail & Related papers (2023-08-25T09:23:55Z) - Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset
and Comprehensive Framework [51.44863255495668]
Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence.
We present Multi-Modal Reasoning(COCO-MMR) dataset, a novel dataset that encompasses an extensive collection of open-ended questions.
We propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders.
arXiv Detail & Related papers (2023-07-24T08:58:25Z) - T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large
Language Model Signals for Science Question Answering [59.63860993280275]
Large Language Models (LLMs) have demonstrated exceptional performance in various Natural Language Processing (NLP) tasks.
We propose a novel method termed T-SciQ that aims at teaching science question answering with LLM signals.
Our approach achieves a new state-of-the-art performance on the ScienceQA benchmark, with an accuracy of 96.18%.
arXiv Detail & Related papers (2023-05-05T11:56:30Z) - Feeding What You Need by Understanding What You Learned [54.400455868448695]
Machine Reading (MRC) reveals the ability to understand a given text passage and answer questions based on it.
Existing research works in MRC rely heavily on large-size models and corpus to improve the performance evaluated by metrics such as Exact Match.
We argue that a deep understanding of model capabilities and data properties can help us feed a model with appropriate training data.
arXiv Detail & Related papers (2022-03-05T14:15:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.