A Question Answering Framework for Decontextualizing User-facing
Snippets from Scientific Documents
- URL: http://arxiv.org/abs/2305.14772v3
- Date: Fri, 1 Dec 2023 00:11:04 GMT
- Title: A Question Answering Framework for Decontextualizing User-facing
Snippets from Scientific Documents
- Authors: Benjamin Newman, Luca Soldaini, Raymond Fok, Arman Cohan, Kyle Lo
- Abstract summary: We use language models to rewrite snippets from scientific documents to be read on their own.
We propose a framework that decomposes the task into three stages: question generation, question answering, and rewriting.
- Score: 47.39561727838956
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many real-world applications (e.g., note taking, search) require extracting a
sentence or paragraph from a document and showing that snippet to a human
outside of the source document. Yet, users may find snippets difficult to
understand as they lack context from the original document. In this work, we
use language models to rewrite snippets from scientific documents to be read on
their own. First, we define the requirements and challenges for this
user-facing decontextualization task, such as clarifying where edits occur and
handling references to other documents. Second, we propose a framework that
decomposes the task into three stages: question generation, question answering,
and rewriting. Using this framework, we collect gold decontextualizations from
experienced scientific article readers. We then conduct a range of experiments
across state-of-the-art commercial and open-source language models to identify
how to best provide missing-but-relevant information to models for our task.
Finally, we develop QaDecontext, a simple prompting strategy inspired by our
framework that improves over end-to-end prompting. We conclude with analysis
that finds, while rewriting is easy, question generation and answering remain
challenging for today's models.
Related papers
- Contri(e)ve: Context + Retrieve for Scholarly Question Answering [0.0]
We present a two step solution using open source Large Language Model(LLM): Llama3.1 for Scholarly-QALD dataset.
Firstly, we extract the context pertaining to the question from different structured and unstructured data sources.
Secondly, we implement prompt engineering to improve the information retrieval performance of the LLM.
arXiv Detail & Related papers (2024-09-13T17:38:47Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot
Document-Level Question Answering [6.224211330728391]
Researchers produce thousands of scholarly documents containing valuable technical knowledge.
Document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge.
We present a three-stage document QA approach: text extraction from PDF; evidence retrieval from extracted texts to form well-posed contexts; and QA to extract knowledge from contexts to return high-quality answers.
arXiv Detail & Related papers (2022-10-04T23:33:52Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - Design Challenges for a Multi-Perspective Search Engine [44.48345943046946]
We study a new perspective-oriented document retrieval paradigm.
We discuss and assess the inherent natural language understanding challenges in order to achieve the goal.
We use the prototype system to conduct a user survey in order to assess the utility of our paradigm.
arXiv Detail & Related papers (2021-12-15T18:59:57Z) - Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text.
In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.