NarrativeXL: A Large-scale Dataset For Long-Term Memory Models
- URL: http://arxiv.org/abs/2305.13877v2
- Date: Fri, 8 Dec 2023 01:45:05 GMT
- Title: NarrativeXL: A Large-scale Dataset For Long-Term Memory Models
- Authors: Arseny Moskvichev and Ky-Vinh Mai
- Abstract summary: Using GPT 3.5, we summarized each scene in 1,500 hand-curated fiction books from Project Gutenberg.
With 990,595 total questions, our dataset is an order of magnitude larger than the closest alternatives.
Most questions have a known retention demand'', indicating how long-term of a memory is needed to answer them.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a new large-scale (nearly a million questions) ultra-long-context
(more than 50,000 words average document length) reading comprehension dataset.
Using GPT 3.5, we summarized each scene in 1,500 hand-curated fiction books
from Project Gutenberg, which resulted in approximately 150 scene-level
summaries per book. After that, we created a number of reading comprehension
questions based on these summaries, including three types of multiple-choice
scene recognition questions, as well as free-form narrative reconstruction
questions. With 990,595 total questions, our dataset is an order of magnitude
larger than the closest alternatives. Crucially, most questions have a known
``retention demand'', indicating how long-term of a memory is needed to answer
them, which should aid long-term memory performance evaluation. We validate our
data in four small-scale experiments: one with human labelers, and three with
existing language models. We show that our questions 1) adequately represent
the source material 2) can be used to diagnose a model's memory capacity 3) are
not trivial for modern language models even when the memory demand does not
exceed those models' context lengths. Lastly, we provide our code which can be
used to further expand the dataset with minimal human labor.
Related papers
- NewsQs: Multi-Source Question Generation for the Inquiring Mind [59.79288644158271]
We present NewsQs, a dataset that provides question-answer pairs for multiple news documents.
To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles.
arXiv Detail & Related papers (2024-02-28T16:59:35Z) - BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models [141.21603469555225]
Large language models (LLMs) have achieved dramatic proficiency over NLP tasks with normal length.
We propose BAMBOO, a multi-task long context benchmark.
It consists of 10 datasets from 5 different long text understanding tasks.
arXiv Detail & Related papers (2023-09-23T11:36:15Z) - Recursively Summarizing Enables Long-Term Dialogue Memory in Large
Language Models [75.98775135321355]
Given a long conversation, large language models (LLMs) fail to recall past information and tend to generate inconsistent responses.
We propose to generate summaries/ memory using large language models (LLMs) to enhance long-term memory ability.
arXiv Detail & Related papers (2023-08-29T04:59:53Z) - RET-LLM: Towards a General Read-Write Memory for Large Language Models [53.288356721954514]
RET-LLM is a novel framework that equips large language models with a general write-read memory unit.
Inspired by Davidsonian semantics theory, we extract and save knowledge in the form of triplets.
Our framework exhibits robust performance in handling temporal-based question answering tasks.
arXiv Detail & Related papers (2023-05-23T17:53:38Z) - Large Language Models Struggle to Learn Long-Tail Knowledge [39.01608375863687]
We study the relationship between the knowledge memorized by large language models and the information in pre-training datasets scraped from the web.
In particular, we show that a language model's ability to answer a fact-based question relates to how many documents associated with that question were seen during pre-training.
arXiv Detail & Related papers (2022-11-15T18:49:27Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - QuALITY: Question Answering with Long Input Texts, Yes! [27.700792723226524]
We introduce QuALITY, a dataset with context passages in English that have an average length of about 5,000 tokens.
Unlike in prior work with passages, our questions are written and validated by contributors who have read the entire passage.
Only half of the questions are answerable by annotators working under tight time constraints.
arXiv Detail & Related papers (2021-12-16T04:14:38Z) - ListReader: Extracting List-form Answers for Opinion Questions [18.50111430378249]
ListReader is a neural ex-tractive QA model for list-form answer.
In addition to learning the alignment between the question and content, we introduce a heterogeneous graph neural network.
Our model adopts a co-extraction setting that can extract either span- or sentence-level answers.
arXiv Detail & Related papers (2021-10-22T10:33:08Z) - ReadTwice: Reading Very Large Documents with Memories [19.45538971299312]
We propose ReadTwice, a technique that combines several strengths of prior approaches to model long-range dependencies with Transformers.
The main idea is to read text in small segments, in parallel, summarizing each segment into a memory table to be used in a second read of the text.
We show that the method outperforms models of comparable size on several question answering (QA) datasets and sets a new state of the art on the challenging NarrativeQA task.
arXiv Detail & Related papers (2021-05-10T10:13:09Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.