Related papers: STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

URL: http://arxiv.org/abs/2404.13207v3
Date: Sun, 20 Oct 2024 18:59:02 GMT
Title: STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
Authors: Shirley Wu, Shiyu Zhao, Michihiro Yasunaga, Kexin Huang, Kaidi Cao, Qian Huang, Vassilis N. Ioannidis, Karthik Subbian, James Zou, Jure Leskovec,
Abstract summary: We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Knowledge Bases. Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
Score: 93.96463520716759
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Answering real-world complex queries, such as complex product search, often requires accurate retrieval from semi-structured knowledge bases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, many previous works studied textual and relational retrieval tasks as separate topics. To address the gap, we develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Relational Knowledge Bases. Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties, together with their ground-truth answers (items). We conduct rigorous human evaluation to validate the quality of our synthesized queries. We further enhance the benchmark with high-quality human-generated queries to provide an authentic reference. STARK serves as a comprehensive testbed for evaluating the performance of retrieval systems driven by large language models (LLMs). Our experiments suggest that STARK presents significant challenges to the current retrieval and LLM systems, highlighting the need for more capable semi-structured retrieval systems. The benchmark data and code are available on https://github.com/snap-stanford/STaRK.

Related papers

Reasoning-enhanced Query Understanding through Decomposition and Interpretation [87.56450566014625]
ReDI is a Reasoning-enhanced approach for query understanding through Decomposition and Interpretation.<n>We compiled a large-scale dataset of real-world complex queries from a major search engine.<n> Experiments on BRIGHT and BEIR demonstrate that ReDI consistently surpasses strong baselines in both sparse and dense retrieval paradigms.
arXiv Detail & Related papers (2025-09-08T10:58:42Z)
MSRS: Evaluating Multi-Source Retrieval-Augmented Generation [51.717139132190574]
Many real-world applications demand the ability to integrate and summarize information scattered across multiple sources.<n>We present a scalable framework for constructing evaluation benchmarks that challenge RAG systems to integrate information across distinct sources.
arXiv Detail & Related papers (2025-08-28T14:59:55Z)
Benchmarking Deep Search over Heterogeneous Enterprise Data [73.55304268238474]
We present a new benchmark for evaluating a form of retrieval-augmented generation (RAG)<n>RAG requires source-aware, multi-hop reasoning over diverse, sparsed, but related sources.<n>We build it using a synthetic data pipeline that simulates business across product planning, development, and support stages.
arXiv Detail & Related papers (2025-06-29T08:34:59Z)
Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol [83.90769864167301]
Literature review tables are essential for summarizing and comparing collections of scientific papers. We explore the task of generating tables that best fulfill a user's informational needs given a collection of scientific papers. Our contributions focus on three key challenges encountered in real-world use: (i) User prompts are often under-specified; (ii) Retrieved candidate papers frequently contain irrelevant content; and (iii) Task evaluation should move beyond shallow text similarity techniques.
arXiv Detail & Related papers (2025-04-14T14:52:28Z)
SRSA: A Cost-Efficient Strategy-Router Search Agent for Real-world Human-Machine Interactions [3.5725872564627785]
In real-world situations, users often input contextual and highly personalized queries to chatbots. Previous research has not focused specifically on authentic human-machine dialogue scenarios. To address these gaps, we propose a Strategy-based Search Agent (SRSA) SRSA routing different queries to appropriate search strategies and enabling fine-grained serial searches to obtain high-quality results at a relatively low cost.
arXiv Detail & Related papers (2024-11-21T20:41:55Z)
ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation [26.4086456393314]
Long-form text generation requires coherent, comprehensive responses that address complex queries with both breadth and depth. Existing iterative retrieval-augmented generation approaches often struggle to delve deeply into each facet of complex queries. This paper introduces ConTReGen, a novel framework that employs a context-driven, tree-structured retrieval approach.
arXiv Detail & Related papers (2024-10-20T21:17:05Z)
RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs [12.846097618151951]
We develop a dataset for LLMs Complex Reasoning over Textual Knowledge Graphs (RiTeK) with a broad topological structure coverage. We synthesize realistic user queries that integrate diverse topological structures, annotated information, and complex textual descriptions. We introduce an enhanced Monte Carlo Tree Search (CTS) method, which automatically extracts relational path information from textual graphs for specific queries.
arXiv Detail & Related papers (2024-10-17T19:33:37Z)
Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval [49.42043077545341]
We propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG) We leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR)
arXiv Detail & Related papers (2024-10-17T17:03:23Z)
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval [54.54576644403115]
We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding. We show that incorporating explicit reasoning about the query improves retrieval performance by up to 12.2 points.
arXiv Detail & Related papers (2024-07-16T17:58:27Z)
CoIR: A Comprehensive Benchmark for Code Information Retrieval Models [56.691926887209895]
We present textbfname (textbfInformation textbfRetrieval Benchmark), a robust and comprehensive benchmark specifically designed to assess code retrieval capabilities. name comprises textbften meticulously curated code datasets, spanning textbfeight distinctive retrieval tasks across textbfseven diverse domains. We evaluate nine widely used retrieval models using name, uncovering significant difficulties in performing code retrieval tasks even with state-of-the-art systems.
arXiv Detail & Related papers (2024-07-03T07:58:20Z)
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z)
Large Search Model: Redefining Search Stack in the Era of LLMs [63.503320030117145]
We introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM) All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts. This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack.
arXiv Detail & Related papers (2023-10-23T05:52:09Z)
Synergistic Interplay between Search and Large Language Models for Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections. InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.