Drilling Down into the Discourse Structure with LLMs for Long Document
Question Answering
- URL: http://arxiv.org/abs/2311.13565v1
- Date: Wed, 22 Nov 2023 18:22:56 GMT
- Title: Drilling Down into the Discourse Structure with LLMs for Long Document
Question Answering
- Authors: Inderjeet Nair, Shwetha Somasundaram, Apoorv Saxena, Koustava Goswami
- Abstract summary: We propose a suite of techniques that exploit the discourse structure commonly found in documents.
We show how our approach can be combined with textitself-ask reasoning agent to achieve best zero-shot performance in complex multi-hop question answering.
- Score: 5.022057415488129
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We address the task of evidence retrieval for long document question
answering, which involves locating relevant paragraphs within a document to
answer a question. We aim to assess the applicability of large language models
(LLMs) in the task of zero-shot long document evidence retrieval, owing to
their unprecedented performance across various NLP tasks. However, currently
the LLMs can consume limited context lengths as input, thus providing document
chunks as inputs might overlook the global context while missing out on
capturing the inter-segment dependencies. Moreover, directly feeding the large
input sets can incur significant computational costs, particularly when
processing the entire document (and potentially incurring monetary expenses
with enterprise APIs like OpenAI's GPT variants). To address these challenges,
we propose a suite of techniques that exploit the discourse structure commonly
found in documents. By utilizing this structure, we create a condensed
representation of the document, enabling a more comprehensive understanding and
analysis of relationships between different parts. We retain $99.6\%$ of the
best zero-shot approach's performance, while processing only $26\%$ of the
total tokens used by the best approach in the information seeking evidence
retrieval setup. We also show how our approach can be combined with
\textit{self-ask} reasoning agent to achieve best zero-shot performance in
complex multi-hop question answering, just $\approx 4\%$ short of zero-shot
performance using gold evidence.
Related papers
- Refiner: Restructure Retrieval Content Efficiently to Advance Question-Answering Capabilities [30.1331670544648]
Large Language Models (LLMs) are limited by their parametric knowledge, leading to hallucinations in knowledge-extensive tasks.
We propose $textitRefiner$, an end-to-end extract-and-restructure paradigm that operates in the post-retrieval process of RAG.
arXiv Detail & Related papers (2024-06-17T09:25:10Z) - Focus Anywhere for Fine-grained Multi-page Document Understanding [24.76897786595502]
This paper proposes Fox, an effective pipeline, hybrid data, and tuning strategy, that catalyzes LVLMs to focus anywhere on single/multi-page documents.
We employ multiple vision vocabularies to extract visual hybrid knowledge for interleaved document pages.
We render cross-vocabulary vision data as the foreground to achieve a full reaction of multiple visual vocabularies and in-document figure understanding.
arXiv Detail & Related papers (2024-05-23T08:15:49Z) - Can't Remember Details in Long Documents? You Need Some R&R [4.465645631325957]
We introduce $textitR&R$ -- a combination of two novel prompt-based methods.
In reprompting, we repeat the prompt instructions periodically throughout the context document.
In ICR, rather than instructing the LLM to answer the question directly, we instruct it to retrieve the top $k$ passage numbers.
arXiv Detail & Related papers (2024-03-08T03:03:20Z) - CFRet-DVQA: Coarse-to-Fine Retrieval and Efficient Tuning for Document
Visual Question Answering [3.8065968624597324]
Document Visual Question Answering (DVQA) is a task that involves responding to queries based on the content of images.
Existing work is limited to locating information within a single page and does not facilitate cross-page question-and-answer interaction.
We introduce CFRet-DVQA, which focuses on retrieval and efficient tuning to address this critical issue effectively.
arXiv Detail & Related papers (2024-02-26T01:17:50Z) - In-context Pretraining: Language Modeling Beyond Document Boundaries [137.53145699439898]
In-Context Pretraining is a new approach where language models are pretrained on a sequence of related documents.
We introduce approximate algorithms for finding related documents with efficient nearest neighbor search.
We see notable improvements in tasks that require more complex contextual reasoning.
arXiv Detail & Related papers (2023-10-16T17:57:12Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - PEARL: Prompting Large Language Models to Plan and Execute Actions Over
Long Documents [78.27865456183397]
We propose PEARL, a prompting framework to improve reasoning over long documents.
Each stage of PEARL is implemented via zero-shot or few-shot prompting with minimal human input.
We evaluate PEARL on a challenging subset of the QuALITY dataset, which contains questions that require complex reasoning over long narrative texts.
arXiv Detail & Related papers (2023-05-23T23:06:04Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - Information Extraction from Documents: Question Answering vs Token
Classification in real-world setups [0.0]
We compare the Question Answering approach with the classical token classification approach for document key information extraction.
Our research showed that when dealing with clean and relatively short entities, it is still best to use token classification-based approach.
arXiv Detail & Related papers (2023-04-21T14:43:42Z) - Self-Prompting Large Language Models for Zero-Shot Open-Domain QA [67.08732962244301]
Open-Domain Question Answering (ODQA) aims to answer questions without explicitly providing background documents.
This task becomes notably challenging in a zero-shot setting where no data is available to train tailored retrieval-reader models.
We propose a Self-Prompting framework to explicitly utilize the massive knowledge encoded in the parameters of Large Language Models.
arXiv Detail & Related papers (2022-12-16T18:23:43Z) - One-shot Key Information Extraction from Document with Deep Partial
Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios.
Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents.
We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.