Related papers: PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents

PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents

URL: http://arxiv.org/abs/2305.14564v1
Date: Tue, 23 May 2023 23:06:04 GMT
Title: PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents
Authors: Simeng Sun, Yang Liu, Shuohang Wang, Chenguang Zhu, Mohit Iyyer
Abstract summary: We propose PEARL, a prompting framework to improve reasoning over long documents. Each stage of PEARL is implemented via zero-shot or few-shot prompting with minimal human input. We evaluate PEARL on a challenging subset of the QuALITY dataset, which contains questions that require complex reasoning over long narrative texts.
Score: 78.27865456183397
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Strategies such as chain-of-thought prompting improve the performance of large language models (LLMs) on complex reasoning tasks by decomposing input examples into intermediate steps. However, it remains unclear how to apply such methods to reason over long input documents, in which both the decomposition and the output of each intermediate step are non-trivial to obtain. In this work, we propose PEARL, a prompting framework to improve reasoning over long documents, which consists of three stages: action mining, plan formulation, and plan execution. More specifically, given a question about a long document, PEARL decomposes the question into a sequence of actions (e.g., SUMMARIZE, FIND_EVENT, FIND_RELATION) and then executes them over the document to obtain the answer. Each stage of PEARL is implemented via zero-shot or few-shot prompting of LLMs (in our work, GPT-4) with minimal human input. We evaluate PEARL on a challenging subset of the QuALITY dataset, which contains questions that require complex reasoning over long narrative texts. PEARL outperforms zero-shot and chain-of-thought prompting on this dataset, and ablation experiments show that each stage of PEARL is critical to its performance. Overall, PEARL is a first step towards leveraging LLMs to reason over long documents.

Related papers

On the Reproducibility of Learned Sparse Retrieval Adaptations for Long Documents [2.186901738997927]
We reproduce and examine the mechanisms of adapting Learned Sparse Retrieval (LSR) for long documents. Our experiments confirmed the importance of specific segments, with the first segment consistently dominating document retrieval performance. We re-evaluated recently proposed methods -- ExactSDM and SoftSDM -- across varying document lengths.
arXiv Detail & Related papers (2025-03-31T08:19:31Z)
A Comprehensive Survey on Long Context Language Modeling [118.5540791080351]
Long Context Language Models (LCLMs) process and analyze extensive inputs in an effective and efficient way. Our survey is structured around three key aspects: how to obtain effective and efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate and analyze LCLMs comprehensively.
arXiv Detail & Related papers (2025-03-20T17:06:28Z)
Zero-Shot Keyphrase Generation: Investigating Specialized Instructions and Multi-Sample Aggregation on Large Language Models [52.829293635314194]
Keyphrase generation is a long-standing NLP task for automatically generating keyphrases for a given document. We focus on the zero-shot capabilities of open-source instruction-tuned LLMs (Phi-3, Llama-3) and the closed-source GPT-4o for this task.
arXiv Detail & Related papers (2025-03-01T19:38:57Z)
HERA: Improving Long Document Summarization using Large Language Models with Context Packaging and Reordering [6.876612430571396]
We propose a novel summary generation framework, called HERA. We first segment a long document by its semantic structure and retrieve text segments about the same event, and finally reorder them to form the input context. The experimental results show that HERA outperforms foundation models in ROUGE, BERTScore and faithfulness metrics.
arXiv Detail & Related papers (2025-02-01T14:55:06Z)
Graph-DPEP: Decomposed Plug and Ensemble Play for Few-Shot Document Relation Extraction with Graph-of-Thoughts Reasoning [34.85741925091139]
Graph-DPEP framework is grounded in the reasoning behind triplet explanation thoughts presented in natural language. We develop "ensemble-play", reapplying generation on the entire type list by leveraging the reasoning thoughts embedded in a sub-graph.
arXiv Detail & Related papers (2024-11-05T07:12:36Z)
LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models [73.13933847198395]
We propose a training-free framework for processing long texts, utilizing a divide-and-conquer strategy to achieve comprehensive document understanding. The proposed LLM$times$MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate answers to produce the final output.
arXiv Detail & Related papers (2024-10-12T03:13:44Z)
Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content. Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning. Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z)
Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization [0.27624021966289597]
This paper introduces EYEGLAXS, a framework that leverages Large Language Models (LLMs) for extractive summarization. EYEGLAXS focuses on extractive summarization to ensure factual and grammatical integrity. The system sets new performance benchmarks on well-known datasets like PubMed and ArXiv.
arXiv Detail & Related papers (2024-08-28T13:52:19Z)
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA [71.04146366608904]
Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. We propose a novel long-context benchmark, Loong, aligning with realistic scenarios through extended multi-document question answering (QA) Loong introduces four types of tasks with a range of context lengths: Spotlight Locating, Comparison, Clustering, and Chain of Reasoning.
arXiv Detail & Related papers (2024-06-25T09:42:56Z)
Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization [7.674972936853123]
We investigate whether combining the queries for the same input context in a single prompt to minimize repeated calls can be successfully used in meeting summarization. We observe that 100% reliability in generating the response in the expected format is usually limited to certain closed-source LLMs.
arXiv Detail & Related papers (2024-02-29T19:00:47Z)
Benchmarking LLMs on the Semantic Overlap Summarization Task [9.656095701778975]
This paper comprehensively evaluates Large Language Models (LLMs) on the Semantic Overlap Summarization (SOS) task. We report well-established metrics like ROUGE, BERTscore, and SEM-F1$ on two different datasets of alternative narratives.
arXiv Detail & Related papers (2024-02-26T20:33:50Z)
ADaPT: As-Needed Decomposition and Planning with Language Models [131.063805299796]
We introduce As-Needed Decomposition and Planning for complex Tasks (ADaPT) ADaPT explicitly plans and decomposes complex sub-tasks as-needed, when the Large Language Models is unable to execute them. Our results demonstrate that ADaPT substantially outperforms established strong baselines.
arXiv Detail & Related papers (2023-11-08T17:59:15Z)
Summ^N: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents [13.755637074366813]
SummN is a simple, flexible, and effective multi-stage framework for input texts longer than the maximum context lengths of typical pretrained LMs. It can process input text of arbitrary length by adjusting the number of stages while keeping the LM context size fixed. Our experiments demonstrate that SummN significantly outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2021-10-16T06:19:54Z)
CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented Dialog Systems [56.302581679816775]
This paper proposes Comprehensive Instruction (CINS) that exploits PLMs with task-specific instructions. We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD. Experiments are conducted on these ToD tasks in realistic few-shot learning scenarios with small validation data.
arXiv Detail & Related papers (2021-09-10T03:23:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.