HAGRID: A Human-LLM Collaborative Dataset for Generative
Information-Seeking with Attribution
- URL: http://arxiv.org/abs/2307.16883v1
- Date: Mon, 31 Jul 2023 17:49:18 GMT
- Title: HAGRID: A Human-LLM Collaborative Dataset for Generative
Information-Seeking with Attribution
- Authors: Ehsan Kamalloo, Aref Jafari, Xinyu Zhang, Nandan Thakur, Jimmy Lin
- Abstract summary: We introduce HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking dataset) for building end-to-end generative information-seeking models.
Unlike recent efforts that focus on black-box proprietary search engines, we built our dataset atop the English subset of MIRACL, a publicly available information retrieval dataset.
- Score: 46.41448772928026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rise of large language models (LLMs) had a transformative impact on
search, ushering in a new era of search engines that are capable of generating
search results in natural language text, imbued with citations for supporting
sources. Building generative information-seeking models demands openly
accessible datasets, which currently remain lacking. In this paper, we
introduce a new dataset, HAGRID (Human-in-the-loop Attributable Generative
Retrieval for Information-seeking Dataset) for building end-to-end generative
information-seeking models that are capable of retrieving candidate quotes and
generating attributed explanations. Unlike recent efforts that focus on human
evaluation of black-box proprietary search engines, we built our dataset atop
the English subset of MIRACL, a publicly available information retrieval
dataset. HAGRID is constructed based on human and LLM collaboration. We first
automatically collect attributed explanations that follow an in-context
citation style using an LLM, i.e. GPT-3.5. Next, we ask human annotators to
evaluate the LLM explanations based on two criteria: informativeness and
attributability. HAGRID serves as a catalyst for the development of
information-seeking models with better attribution capabilities.
Related papers
- Synthetic Data Generation with Large Language Models for Personalized Community Question Answering [47.300506002171275]
We build Sy-SE-PQA based on an existing dataset, SE-PQA, which consists of questions and answers posted on the popular StackExchange communities.
Our findings suggest that LLMs have high potential in generating data tailored to users' needs.
The synthetic data can replace human-written training data, even if the generated data may contain incorrect information.
arXiv Detail & Related papers (2024-10-29T16:19:08Z) - Beyond Retrieval: Generating Narratives in Conversational Recommender Systems [4.912663905306209]
We introduce a new dataset (REGEN) for natural language generation tasks in conversational recommendations.
We establish benchmarks using well-known generative metrics, and perform an automated evaluation of the new dataset using a rater LLM.
And to the best of our knowledge, represents the first attempt to analyze the capabilities of LLMs in understanding recommender signals and generating rich narratives.
arXiv Detail & Related papers (2024-10-22T07:53:41Z) - Leveraging Large Language Models for Web Scraping [0.0]
This research investigates a general-purpose accurate data scraping recipe for RAG models designed for language generation.
To capture knowledge in a more modular and interpretable way, we use pre trained language models with a latent knowledge retriever.
arXiv Detail & Related papers (2024-06-12T14:15:15Z) - Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration [60.535793237063885]
The proliferation of Large Language Models (LLMs) has led to an influx of AI-generated content (AIGC) on the internet.
The impact of this surge in AIGC on Information Retrieval systems remains an open question.
We introduce Cocktail, a benchmark tailored for evaluating IR models in this mixed-sourced data landscape.
arXiv Detail & Related papers (2024-05-26T12:30:20Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Enabling Large Language Models to Generate Text with Citations [37.64884969997378]
Large language models (LLMs) have emerged as a widely-used tool for information seeking.
Our aim is to allow LLMs to generate text with citations, improving their factual correctness and verifiability.
We propose ALCE, the first benchmark for Automatic LLMs' Citation Evaluation.
arXiv Detail & Related papers (2023-05-24T01:53:49Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.