Related papers: RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs

RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs

URL: http://arxiv.org/abs/2410.13987v1
Date: Thu, 17 Oct 2024 19:33:37 GMT
Title: RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs
Authors: Jiatan Huang, Mingchen Li, Zonghai Yao, Zhichao Yang, Yongkang Xiao, Feiyun Ouyang, Xiaohan Li, Shuo Han, Hong Yu,
Abstract summary: We develop a dataset for LLMs Complex Reasoning over Textual Knowledge Graphs (RiTeK) with a broad topological structure coverage. We synthesize realistic user queries that integrate diverse topological structures, annotated information, and complex textual descriptions. We introduce an enhanced Monte Carlo Tree Search (CTS) method, which automatically extracts relational path information from textual graphs for specific queries.
Score: 12.846097618151951
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Answering complex real-world questions often requires accurate retrieval from textual knowledge graphs (TKGs). The scarcity of annotated data, along with intricate topological structures, makes this task particularly challenging. As the nature of relational path information could enhance the inference ability of Large Language Models (LLMs), efficiently retrieving more complex relational path information from TKGs presents another key challenge. To tackle these challenges, we first develop a Dataset for LLMs Complex Reasoning over Textual Knowledge Graphs (RiTeK) with a broad topological structure coverage.We synthesize realistic user queries that integrate diverse topological structures, relational information, and complex textual descriptions. We conduct rigorous expert evaluation to validate the quality of our synthesized queries. And then, we introduce an enhanced Monte Carlo Tree Search (MCTS) method, Relational MCTS, to automatically extract relational path information from textual graphs for specific queries. Our dataset mainly covers the medical domain as the relation types and entity are complex and publicly available. Experimental results indicate that RiTeK poses significant challenges for current retrieval and LLM systems, while the proposed Relational MCTS method enhances LLM inference ability and achieves state-of-the-art performance on RiTeK.

Related papers

Pseudo-Knowledge Graph: Meta-Path Guided Retrieval and In-Graph Text for RAG-Equipped LLM [8.941718961724984]
Pseudo-Knowledge Graph (PKG) framework integrates Meta-path Retrieval, In-graph Text and Vector Retrieval into Large Language Models. PKG offers a richer knowledge representation and improves accuracy in information retrieval.
arXiv Detail & Related papers (2025-03-01T02:39:37Z)
Holistically Guided Monte Carlo Tree Search for Intricate Information Seeking [118.3983437282541]
We introduce an LLM-based search assistant that adopts a new information seeking paradigm with holistically guided Monte Carlo tree search (HG-MCTS) We reformulate the task as a progressive information collection process with a knowledge memory and unite an adaptive checklist with multi-perspective reward modeling in MCTS. Our multi-perspective reward modeling offers both exploration and retrieval rewards, along with progress feedback that tracks completed and remaining sub-goals.
arXiv Detail & Related papers (2025-02-07T08:36:39Z)
ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation [26.4086456393314]
Long-form text generation requires coherent, comprehensive responses that address complex queries with both breadth and depth. Existing iterative retrieval-augmented generation approaches often struggle to delve deeply into each facet of complex queries. This paper introduces ConTReGen, a novel framework that employs a context-driven, tree-structured retrieval approach.
arXiv Detail & Related papers (2024-10-20T21:17:05Z)
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Knowledge Bases. Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z)
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA) We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z)
Meta Operator for Complex Query Answering on Knowledge Graphs [58.340159346749964]
We argue that different logical operator types, rather than the different complex query types, are the key to improving generalizability. We propose a meta-learning algorithm to learn the meta-operators with limited data and adapt them to different instances of operators under various complex queries. Empirical results show that learning meta-operators is more effective than learning original CQA or meta-CQA models.
arXiv Detail & Related papers (2024-03-15T08:54:25Z)
Type-based Neural Link Prediction Adapter for Complex Query Answering [2.1098688291287475]
We propose TypE-based Neural Link Prediction Adapter (TENLPA), a novel model that constructs type-based entity-relation graphs. In order to effectively combine type information with complex logical queries, an adaptive learning mechanism is introduced. Experiments on 3 standard datasets show that TENLPA model achieves state-of-the-art performance on complex query answering.
arXiv Detail & Related papers (2024-01-29T10:54:28Z)
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z)
Can Large Language Models Understand Real-World Complex Instructions? [54.86632921036983]
Large language models (LLMs) can understand human instructions, but struggle with complex instructions. Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions. We propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically.
arXiv Detail & Related papers (2023-09-17T04:18:39Z)
Synergistic Interplay between Search and Large Language Models for Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections. InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z)
Complex Logical Reasoning over Knowledge Graphs using Large Language Models [13.594992599230277]
Reasoning over knowledge graphs (KGs) is a challenging task that requires a deep understanding of the relationships between entities. Current approaches rely on learning geometries to embed entities in vector space for logical query operations. We propose a novel decoupled approach, Language-guided Abstract Reasoning over Knowledge graphs (LARK), that formulates complex KG reasoning as a combination of contextual KG search and logical query reasoning.
arXiv Detail & Related papers (2023-05-02T02:21:49Z)
Query-Specific Knowledge Graphs for Complex Finance Topics [6.599344783327053]
We focus on the CODEC dataset, where domain experts create challenging questions. We show that state-of-the-art ranking systems have headroom for improvement. We demonstrate that entity and document relevance are positively correlated.
arXiv Detail & Related papers (2022-11-08T10:21:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.