Walking Down the Memory Maze: Beyond Context Limit through Interactive
Reading
- URL: http://arxiv.org/abs/2310.05029v1
- Date: Sun, 8 Oct 2023 06:18:14 GMT
- Title: Walking Down the Memory Maze: Beyond Context Limit through Interactive
Reading
- Authors: Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz
- Abstract summary: We introduce MemWalker, a method that processes the long context into a tree of summary nodes. Upon receiving a query, the model navigates this tree in search of relevant information, and responds once it gathers sufficient information.
We show that, beyond effective reading, MemWalker enhances explainability by highlighting the reasoning steps as it interactively reads the text; pinpointing the relevant text segments related to the query.
- Score: 63.93888816206071
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have advanced in large strides due to the
effectiveness of the self-attention mechanism that processes and compares all
tokens at once. However, this mechanism comes with a fundamental issue -- the
predetermined context window is bound to be limited. Despite attempts to extend
the context window through methods like extrapolating the positional embedding,
using recurrence, or selectively retrieving essential parts of the long
sequence, long-text understanding continues to be a challenge. We propose an
alternative approach which instead treats the LLM as an interactive agent,
allowing it to decide how to read the text via iterative prompting. We
introduce MemWalker, a method that first processes the long context into a tree
of summary nodes. Upon receiving a query, the model navigates this tree in
search of relevant information, and responds once it gathers sufficient
information. On long-text question answering tasks our method outperforms
baseline approaches that use long context windows, recurrence, and retrieval.
We show that, beyond effective reading, MemWalker enhances explainability by
highlighting the reasoning steps as it interactively reads the text;
pinpointing the relevant text segments related to the query.
Related papers
- QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism [46.441032033076034]
Memory mechanism offers a flexible solution for managing long contexts.
We introduce a novel strategy, Question then Reflection Memory Mechanism (QRMeM), incorporating a dual-structured memory pool.
Our evaluation across multiple-choice questions (MCQ) and multi-document question answering (Multi-doc QA) benchmarks showcases QRMeM enhanced performance compared to existing approaches.
arXiv Detail & Related papers (2024-06-19T02:46:18Z) - FragRel: Exploiting Fragment-level Relations in the External Memory of Large Language Models [54.13671100638092]
We propose a fragment-connected Hierarchical Memory based Large Language Models (LLMs)
We formulate the fragment-level relations in external memory and present several instantiations for different text types.
We validate the benefits of involving these relations on long story understanding, repository-level code generation, and long-term chatting.
arXiv Detail & Related papers (2024-06-05T09:31:37Z) - Toward Conversational Agents with Context and Time Sensitive Long-term Memory [8.085414868117917]
Until recently, most work on RAG has focused on information retrieval from large databases of texts, like Wikipedia.
We argue that effective retrieval from long-form conversational data faces two unique problems compared to static database retrieval.
We generate a new dataset of ambiguous and time-based questions that build upon a recent dataset of long-form, simulated conversations.
arXiv Detail & Related papers (2024-05-29T18:19:46Z) - A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts [35.68159165639245]
We propose ReadAgent, an agent system that increases effective context length up to 20x in our experiments.
Inspired by how humans interactively read long documents, we implement ReadAgent as a simple prompting system.
We evaluate ReadAgent against baselines using retrieval methods, using the original long contexts, and using the gist memories.
arXiv Detail & Related papers (2024-02-15T05:40:21Z) - Tree-Based Hard Attention with Self-Motivation for Large Language Models [7.2677650379517775]
Large language models (LLMs) excel at understanding and generating plain text.
They are not specifically tailored to handle hierarchical text structures.
We propose a novel framework called Tree-Based Hard Attention with Self-Motivation for Large Language Models.
arXiv Detail & Related papers (2024-02-14T00:40:51Z) - Recursively Summarizing Enables Long-Term Dialogue Memory in Large
Language Models [75.98775135321355]
Given a long conversation, large language models (LLMs) fail to recall past information and tend to generate inconsistent responses.
We propose to generate summaries/ memory using large language models (LLMs) to enhance long-term memory ability.
arXiv Detail & Related papers (2023-08-29T04:59:53Z) - SSP: Self-Supervised Post-training for Conversational Search [63.28684982954115]
We propose fullmodel (model) which is a new post-training paradigm with three self-supervised tasks to efficiently initialize the conversational search model.
To verify the effectiveness of our proposed method, we apply the conversational encoder post-trained by model on the conversational search task using two benchmark datasets: CAsT-19 and CAsT-20.
arXiv Detail & Related papers (2023-07-02T13:36:36Z) - Phrase Retrieval for Open-Domain Conversational Question Answering with
Conversational Dependency Modeling via Contrastive Learning [54.55643652781891]
Open-Domain Conversational Question Answering (ODConvQA) aims at answering questions through a multi-turn conversation.
We propose a method to directly predict answers with a phrase retrieval scheme for a sequence of words.
arXiv Detail & Related papers (2023-06-07T09:46:38Z) - Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue
Questions with LLMs [59.74002011562726]
We propose a novel linguistic cue-based chain-of-thoughts (textitCue-CoT) to provide a more personalized and engaging response.
We build a benchmark with in-depth dialogue questions, consisting of 6 datasets in both Chinese and English.
Empirical results demonstrate our proposed textitCue-CoT method outperforms standard prompting methods in terms of both textithelpfulness and textitacceptability on all datasets.
arXiv Detail & Related papers (2023-05-19T16:27:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.