Related papers: A Survey of Generative Information Retrieval

A Survey of Generative Information Retrieval

URL: http://arxiv.org/abs/2406.01197v2
Date: Tue, 4 Jun 2024 04:12:39 GMT
Title: A Survey of Generative Information Retrieval
Authors: Tzu-Lin Kuo, Tzu-Wei Chiu, Tzung-Sheng Lin, Sheng-Yang Wu, Chao-Wei Huang, Yun-Nung Chen,
Abstract summary: Generative Retrieval (GR) is an emerging paradigm in information retrieval that leverages generative models to map queries to relevant document identifiers (DocIDs) without the need for traditional query processing or document reranking. This survey provides a comprehensive overview of GR, highlighting key developments, indexing and retrieval strategies, and challenges.
Score: 25.1249210843116
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Generative Retrieval (GR) is an emerging paradigm in information retrieval that leverages generative models to directly map queries to relevant document identifiers (DocIDs) without the need for traditional query processing or document reranking. This survey provides a comprehensive overview of GR, highlighting key developments, indexing and retrieval strategies, and challenges. We discuss various document identifier strategies, including numerical and string-based identifiers, and explore different document representation methods. Our primary contribution lies in outlining future research directions that could profoundly impact the field: improving the quality of query generation, exploring learnable document identifiers, enhancing scalability, and integrating GR with multi-task learning frameworks. By examining state-of-the-art GR techniques and their applications, this survey aims to provide a foundational understanding of GR and inspire further innovations in this transformative approach to information retrieval. We also make the complementary materials such as paper collection publicly available at https://github.com/MiuLab/GenIR-Survey/

Related papers

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents [30.012487475552575]
We introduce a new RAG framework, VDocRAG, which can directly understand varied documents and modalities in a unified image format. We also introduce OpenDocVQA, the first unified collection of open-domain document visual question answering datasets.
arXiv Detail & Related papers (2025-04-14T01:50:33Z)
A Survey on Knowledge-Oriented Retrieval-Augmented Generation [45.65542434522205]
Retrieval-Augmented Generation (RAG) has gained significant attention in recent years. RAG combines large-scale retrieval systems with generative models. We discuss the key characteristics of RAG, such as its ability to augment generative models with dynamic external knowledge.
arXiv Detail & Related papers (2025-03-11T01:59:35Z)
DOGR: Leveraging Document-Oriented Contrastive Learning in Generative Retrieval [10.770281363775148]
We propose a novel and general generative retrieval framework, namely Leveraging Document-Oriented Contrastive Learning in Generative Retrieval (DOGR) It adopts a two-stage learning strategy that captures the relationship between queries and documents comprehensively through direct interactions. Negative sampling methods and corresponding contrastive learning objectives are implemented to enhance the learning of semantic representations.
arXiv Detail & Related papers (2025-02-11T03:25:42Z)
Generative Retrieval for Book search [106.67655212825025]
We propose an effective Generative retrieval framework for Book Search. It features two main components: data augmentation and outline-oriented book encoding. Experiments on a proprietary Baidu dataset demonstrate that GBS outperforms strong baselines.
arXiv Detail & Related papers (2025-01-19T12:57:13Z)
VISA: Retrieval Augmented Generation with Visual Source Attribution [100.78278689901593]
Existing approaches in RAG primarily link generated content to document-level references. We propose Retrieval-Augmented Generation with Visual Source Attribution (VISA), a novel approach that combines answer generation with visual source attribution. To evaluate its effectiveness, we curated two datasets: Wiki-VISA, based on crawled Wikipedia webpage screenshots, and Paper-VISA, derived from PubLayNet and tailored to the medical domain.
arXiv Detail & Related papers (2024-12-19T02:17:35Z)
G-RAG: Knowledge Expansion in Material Science [0.0]
Graph RAG integrates graph databases to enhance the retrieval process. We implement an agent-based parsing technique to achieve a more detailed representation of the documents.
arXiv Detail & Related papers (2024-11-21T21:22:58Z)
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents [66.42579289213941]
Retrieval-augmented generation (RAG) is an effective technique that enables large language models to utilize external knowledge sources for generation. In this paper, we introduce VisRAG, which tackles this issue by establishing a vision-language model (VLM)-based RAG pipeline. In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.
arXiv Detail & Related papers (2024-10-14T15:04:18Z)
Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation [72.70046559930555]
We propose a generic RAG approach called Adaptive Note-Enhanced RAG (Adaptive-Note) for complex QA tasks. Specifically, Adaptive-Note introduces an overarching view of knowledge growth, iteratively gathering new information in the form of notes. In addition, we employ an adaptive, note-based stop-exploration strategy to decide "what to retrieve and when to stop" to encourage sufficient knowledge exploration.
arXiv Detail & Related papers (2024-10-11T14:03:29Z)
Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities. Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z)
Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$) GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training. Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z)
A Survey of Generative Search and Recommendation in the Era of Large Language Models [125.26354486027408]
generative search (retrieval) and recommendation aims to address the matching problem in a generative manner. Superintelligent generative large language models have sparked a new paradigm in search and recommendation.
arXiv Detail & Related papers (2024-04-25T17:58:17Z)
From Matching to Generation: A Survey on Generative Information Retrieval [21.56093567336119]
generative information retrieval (GenIR) has emerged as a novel paradigm, gaining increasing attention in recent years. This paper aims to systematically review the latest research progress in GenIR.
arXiv Detail & Related papers (2024-04-23T09:05:37Z)
A Survey on Retrieval-Augmented Text Generation for Large Language Models [1.4579344926652844]
Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements. This paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation. It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies.
arXiv Detail & Related papers (2024-04-17T01:27:42Z)
Retrieval-Augmented Generation for Large Language Models: A Survey [17.82361213043507]
Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases.
arXiv Detail & Related papers (2023-12-18T07:47:33Z)
Evaluating Generative Ad Hoc Information Retrieval [58.800799175084286]
generative retrieval systems often directly return a grounded generated text as a response to a query. Quantifying the utility of the textual responses is essential for appropriately evaluating such generative ad hoc retrieval.
arXiv Detail & Related papers (2023-11-08T14:05:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.