Related papers: AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning

AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning

URL: http://arxiv.org/abs/2411.16495v1
Date: Mon, 25 Nov 2024 15:35:51 GMT
Title: AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning
Authors: Amy Xin, Jinxin Liu, Zijun Yao, Zhicheng Li, Shulin Cao, Lei Hou, Juanzi Li,
Abstract summary: AtomR is a novel heterogeneous knowledge reasoning framework. It decomposes complex questions into combinations of three atomic knowledge operators. AtomR significantly outperforms state-of-the-art baselines across three single-source and two multi-source reasoning benchmarks.
Score: 38.736190591684
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in large language models (LLMs) have led to significant improvements in various natural language processing tasks, but it is still challenging for LLMs to perform knowledge-intensive complex question answering due to LLMs' inefficacy in reasoning planning and the hallucination problem. A typical solution is to employ retrieval-augmented generation (RAG) coupled with chain-of-thought (CoT) reasoning, which decomposes complex questions into chain-like sub-questions and applies iterative RAG at each sub-question. However, prior works exhibit sub-optimal reasoning planning and overlook dynamic knowledge retrieval from heterogeneous sources. In this paper, we propose AtomR, a novel heterogeneous knowledge reasoning framework that conducts multi-source reasoning at the atomic level. Drawing inspiration from the graph modeling of knowledge, AtomR leverages large language models (LLMs) to decompose complex questions into combinations of three atomic knowledge operators, significantly enhancing the reasoning process at both the planning and execution stages. We also introduce BlendQA, a novel evaluation benchmark tailored to assess complex heterogeneous knowledge reasoning. Experiments show that AtomR significantly outperforms state-of-the-art baselines across three single-source and two multi-source reasoning benchmarks, with notable performance gains of 9.4% on 2WikiMultihop and 9.5% on BlendQA.

Related papers

DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router [57.28685457991806]
DeepSieve is an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router.<n>Our design emphasizes modularity, transparency, and adaptability, leveraging recent advances in agentic system design.
arXiv Detail & Related papers (2025-07-29T17:55:23Z)
GenKI: Enhancing Open-Domain Question Answering with Knowledge Integration and Controllable Generation in Large Language Models [75.25348392263676]
Open-domain question answering (OpenQA) represents a cornerstone in natural language processing (NLP)<n>We propose a novel framework named GenKI, which aims to improve the OpenQA performance by exploring Knowledge Integration and controllable Generation.
arXiv Detail & Related papers (2025-05-26T08:18:33Z)
System-2 Mathematical Reasoning via Enriched Instruction Tuning [13.672967091915181]
Enriched Instruction Tuning (EIT) is a method that enriches existing human-annotated mathematical datasets by synergizing human and AI feedback. EIT achieves an accuracy of 84.1% on GSM8K and 32.5% on MATH, surpassing state-of-the-art fine-tuning and prompting methods.
arXiv Detail & Related papers (2024-12-22T10:49:27Z)
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [102.31558123570437]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs) We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z)
Atomic Fact Decomposition Helps Attributed Question Answering [30.75332718824254]
Attributed Question Answering (AQA) aims to provide both a trustworthy answer and a reliable attribution report for a question. This paper proposes an Atomic fact decomposition-based Retrieval and Editing framework. It decomposes the generated long-form answers into molecular clauses and atomic facts by the instruction-tuned LLMs.
arXiv Detail & Related papers (2024-10-22T05:25:54Z)
Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design [63.24275274981911]
Compound AI Systems consisting of many language model inference calls are increasingly employed. In this work, we construct systems, which we call Networks of Networks (NoNs) organized around the distinction between generating a proposed answer and verifying its correctness. We introduce a verifier-based judge NoN with K generators, an instantiation of "best-of-K" or "judge-based" compound AI systems.
arXiv Detail & Related papers (2024-07-23T20:40:37Z)
BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering [29.442468366125986]
We propose BeamAggR, a reasoning framework for knowledge-intensive multi-hop QA. We parse complex questions into trees, which include atom and composite questions, followed by bottom-up reasoning. For atomic questions, the LLM conducts reasoning on multi-source knowledge to get answer candidates. For composite questions, the LLM combines beam candidates, explores multiple reasoning paths through probabilistic aggregation, and prioritizes the most promising trajectory.
arXiv Detail & Related papers (2024-06-28T10:53:48Z)
Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization [30.349165483935682]
How large language models (LLMs) use their knowledge for reasoning is not yet well understood. We develop the DepthQA dataset, deconstructing questions into three depths: (i) recalling conceptual knowledge, (ii) applying procedural knowledge, and (iii) analyzing strategic knowledge. Distinct patterns of discrepancies are observed across model capacity and possibility of training data memorization.
arXiv Detail & Related papers (2024-06-27T19:29:36Z)
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models [84.15513004135576]
Current research enhances the reasoning performance of Large Language Models (LLMs) by sampling multiple reasoning chains and ensembling based on the answer frequency. This approach fails in scenarios where the correct answers are in the minority. We introduce a hierarchical reasoning aggregation framework AoR, which selects answers based on the evaluation of reasoning chains.
arXiv Detail & Related papers (2024-05-21T17:12:19Z)
Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning [89.89857766491475]
We propose a complex reasoning schema over KG upon large language models (LLMs) We augment the arbitrary first-order logical queries via binary tree decomposition to stimulate the reasoning capability of LLMs. Experiments across widely used datasets demonstrate that LACT has substantial improvements(brings an average +5.5% MRR score) over advanced methods.
arXiv Detail & Related papers (2024-05-02T18:12:08Z)
Laying the Foundation First? Investigating the Generalization from Atomic Skills to Complex Reasoning Tasks [40.7766635942194]
We propose a probing framework to investigate whether the atomic skill can spontaneously generalize to complex reasoning tasks. We then introduce a hierarchical curriculum learning training strategy to achieve better skill generalization. By leveraging hierarchical curriculum learning, we successfully induce generalization, significantly improve the performance of open-source LMs on complex reasoning tasks.
arXiv Detail & Related papers (2024-03-14T15:20:54Z)
Enhancing Textbook Question Answering Task with Large Language Models and Retrieval Augmented Generation [3.948068081583197]
This paper proposes a methodology that handle the out-of-domain scenario in Textbook question answering (TQA) Through supervised fine-tuning of the LLM model Llama-2 and the incorporation of RAG, our architecture outperforms the baseline, achieving a 4.12% accuracy improvement on validation set and 9.84% on test set for non-diagram multiple-choice questions.
arXiv Detail & Related papers (2024-02-05T11:58:56Z)
Reasoning over Hierarchical Question Decomposition Tree for Explainable Question Answering [83.74210749046551]
We propose to leverage question decomposing for heterogeneous knowledge integration. We propose a novel two-stage XQA framework, Reasoning over Hierarchical Question Decomposition Tree (RoHT) Experiments on complex QA datasets KQA Pro and Musique show that our framework outperforms SOTA methods significantly.
arXiv Detail & Related papers (2023-05-24T11:45:59Z)
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering [59.63860993280275]
Large Language Models (LLMs) have demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. We propose a novel method termed T-SciQ that aims at teaching science question answering with LLM signals. Our approach achieves a new state-of-the-art performance on the ScienceQA benchmark, with an accuracy of 96.18%.
arXiv Detail & Related papers (2023-05-05T11:56:30Z)
ArT: All-round Thinker for Unsupervised Commonsense Question-Answering [54.068032948300655]
We propose an approach of All-round Thinker (ArT) by fully taking association during knowledge generating. We evaluate it on three commonsense QA benchmarks: COPA, SocialIQA and SCT.
arXiv Detail & Related papers (2021-12-26T18:06:44Z)
KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT) All tasks in KILT are grounded in the same snapshot of Wikipedia. We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.