AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning
- URL: http://arxiv.org/abs/2411.16495v1
- Date: Mon, 25 Nov 2024 15:35:51 GMT
- Title: AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning
- Authors: Amy Xin, Jinxin Liu, Zijun Yao, Zhicheng Li, Shulin Cao, Lei Hou, Juanzi Li,
- Abstract summary: AtomR is a novel heterogeneous knowledge reasoning framework.
It decomposes complex questions into combinations of three atomic knowledge operators.
AtomR significantly outperforms state-of-the-art baselines across three single-source and two multi-source reasoning benchmarks.
- Score: 38.736190591684
- License:
- Abstract: Recent advancements in large language models (LLMs) have led to significant improvements in various natural language processing tasks, but it is still challenging for LLMs to perform knowledge-intensive complex question answering due to LLMs' inefficacy in reasoning planning and the hallucination problem. A typical solution is to employ retrieval-augmented generation (RAG) coupled with chain-of-thought (CoT) reasoning, which decomposes complex questions into chain-like sub-questions and applies iterative RAG at each sub-question. However, prior works exhibit sub-optimal reasoning planning and overlook dynamic knowledge retrieval from heterogeneous sources. In this paper, we propose AtomR, a novel heterogeneous knowledge reasoning framework that conducts multi-source reasoning at the atomic level. Drawing inspiration from the graph modeling of knowledge, AtomR leverages large language models (LLMs) to decompose complex questions into combinations of three atomic knowledge operators, significantly enhancing the reasoning process at both the planning and execution stages. We also introduce BlendQA, a novel evaluation benchmark tailored to assess complex heterogeneous knowledge reasoning. Experiments show that AtomR significantly outperforms state-of-the-art baselines across three single-source and two multi-source reasoning benchmarks, with notable performance gains of 9.4% on 2WikiMultihop and 9.5% on BlendQA.
Related papers
- System-2 Mathematical Reasoning via Enriched Instruction Tuning [13.672967091915181]
Enriched Instruction Tuning (EIT) is a method that enriches existing human-annotated mathematical datasets by synergizing human and AI feedback.
EIT achieves an accuracy of 84.1% on GSM8K and 32.5% on MATH, surpassing state-of-the-art fine-tuning and prompting methods.
arXiv Detail & Related papers (2024-12-22T10:49:27Z) - Atomic Fact Decomposition Helps Attributed Question Answering [30.75332718824254]
Attributed Question Answering (AQA) aims to provide both a trustworthy answer and a reliable attribution report for a question.
This paper proposes an Atomic fact decomposition-based Retrieval and Editing framework.
It decomposes the generated long-form answers into molecular clauses and atomic facts by the instruction-tuned LLMs.
arXiv Detail & Related papers (2024-10-22T05:25:54Z) - Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design [63.24275274981911]
Compound AI Systems consisting of many language model inference calls are increasingly employed.
In this work, we construct systems, which we call Networks of Networks (NoNs) organized around the distinction between generating a proposed answer and verifying its correctness.
We introduce a verifier-based judge NoN with K generators, an instantiation of "best-of-K" or "judge-based" compound AI systems.
arXiv Detail & Related papers (2024-07-23T20:40:37Z) - BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering [29.442468366125986]
We propose BeamAggR, a reasoning framework for knowledge-intensive multi-hop QA.
We parse complex questions into trees, which include atom and composite questions, followed by bottom-up reasoning.
For atomic questions, the LLM conducts reasoning on multi-source knowledge to get answer candidates.
For composite questions, the LLM combines beam candidates, explores multiple reasoning paths through probabilistic aggregation, and prioritizes the most promising trajectory.
arXiv Detail & Related papers (2024-06-28T10:53:48Z) - Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning [89.89857766491475]
We propose a complex reasoning schema over KG upon large language models (LLMs)
We augment the arbitrary first-order logical queries via binary tree decomposition to stimulate the reasoning capability of LLMs.
Experiments across widely used datasets demonstrate that LACT has substantial improvements(brings an average +5.5% MRR score) over advanced methods.
arXiv Detail & Related papers (2024-05-02T18:12:08Z) - Laying the Foundation First? Investigating the Generalization from Atomic Skills to Complex Reasoning Tasks [40.7766635942194]
We propose a probing framework to investigate whether the atomic skill can spontaneously generalize to complex reasoning tasks.
We then introduce a hierarchical curriculum learning training strategy to achieve better skill generalization.
By leveraging hierarchical curriculum learning, we successfully induce generalization, significantly improve the performance of open-source LMs on complex reasoning tasks.
arXiv Detail & Related papers (2024-03-14T15:20:54Z) - Reasoning over Hierarchical Question Decomposition Tree for Explainable
Question Answering [83.74210749046551]
We propose to leverage question decomposing for heterogeneous knowledge integration.
We propose a novel two-stage XQA framework, Reasoning over Hierarchical Question Decomposition Tree (RoHT)
Experiments on complex QA datasets KQA Pro and Musique show that our framework outperforms SOTA methods significantly.
arXiv Detail & Related papers (2023-05-24T11:45:59Z) - ArT: All-round Thinker for Unsupervised Commonsense Question-Answering [54.068032948300655]
We propose an approach of All-round Thinker (ArT) by fully taking association during knowledge generating.
We evaluate it on three commonsense QA benchmarks: COPA, SocialIQA and SCT.
arXiv Detail & Related papers (2021-12-26T18:06:44Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.