diff History for Neural Language Agents
- URL: http://arxiv.org/abs/2312.07540v3
- Date: Tue, 11 Jun 2024 17:57:15 GMT
- Title: diff History for Neural Language Agents
- Authors: Ulyana Piterbarg, Lerrel Pinto, Rob Fergus,
- Abstract summary: We introduce diff history, a simple and highly effective solution to these issues.
By applying the Unix diff command on consecutive text observations in the interaction histories used to prompt LM policies, we can both abstract away redundant information.
On NetHack, an unsolved video game that requires long-horizon reasoning for decision-making, LMs tuned with diff history match state-of-the-art performance for neural agents.
- Score: 33.13471417703669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Language Models (LMs) offer an exciting solution for general-purpose embodied control. However, a key technical issue arises when using an LM-based controller: environment observations must be converted to text, which coupled with history, results in long and verbose textual prompts. As a result, prior work in LM agents is limited to restricted domains with small observation size as well as minimal needs for interaction history or instruction tuning. In this paper, we introduce diff history, a simple and highly effective solution to these issues. By applying the Unix diff command on consecutive text observations in the interaction histories used to prompt LM policies, we can both abstract away redundant information and focus the content of textual inputs on the salient changes in the environment. On NetHack, an unsolved video game that requires long-horizon reasoning for decision-making, LMs tuned with diff history match state-of-the-art performance for neural agents while needing 1800x fewer training examples compared to prior work. Even on the simpler BabyAI-Text environment with concise text observations, we find that although diff history increases the length of prompts, the representation it provides offers a 25% improvement in the efficiency of low-sample instruction tuning. Further, we show that diff history scales favorably across different tuning dataset sizes. We open-source our code and data to https://diffhistory.github.io.
Related papers
- Controllable Abstraction in Summary Generation for Large Language Models via Prompt Engineering [9.192759263055942]
This study presents a controllable abstract summary generation method for large language models based on prompt engineering.<n>It generates summaries with varying levels of abstraction by performing semantic analysis, topic modeling, and noise control on the input text.<n>The experiment uses the CNN/Daily Mail dataset and provides a detailed analysis of different prompt lengths, data noise, and text types.
arXiv Detail & Related papers (2025-10-17T08:50:55Z) - Few-Shot Connectivity-Aware Text Line Segmentation in Historical Documents [1.4065611645922207]
In this work, we demonstrate that small and simple architectures, coupled with a topology-aware loss function, are more accurate and data-efficient than more complex alternatives.<n>Our methodology significantly improves upon the current state-of-the-art on the U-DIADS-TL dataset, with a 200% increase in Recognition Accuracy and a 75% increase in Line Intersection over Union.
arXiv Detail & Related papers (2025-08-26T16:11:32Z) - END: Early Noise Dropping for Efficient and Effective Context Denoising [60.24648712022382]
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks.
They are often distracted by irrelevant or noisy context in input sequences that degrades output quality.
We introduce Early Noise Dropping (textscEND), a novel approach to mitigate this issue without requiring fine-tuning the LLMs.
arXiv Detail & Related papers (2025-02-26T08:07:17Z) - Towards Text-Image Interleaved Retrieval [49.96332254241075]
We introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences.
We construct a TIIR benchmark based on naturally interleaved wikiHow tutorials, where a specific pipeline is designed to generate interleaved queries.
We propose a novel Matryoshka Multimodal Embedder (MME), which compresses the number of visual tokens at different granularity.
arXiv Detail & Related papers (2025-02-18T12:00:47Z) - Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens [21.61634020256455]
Transformer-based large language models (LLMs) suffer a performance degradation when modeling long-term contexts.
We propose a simple yet effective method to enable LLMs to take a deep breath, encouraging them to summarize information contained within discrete text chunks.
arXiv Detail & Related papers (2024-06-16T15:50:10Z) - Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction [36.915250638481986]
We introduce LiveSum, a new benchmark dataset for generating summary tables of competitions based on real-time commentary texts.<n>We evaluate the performances of state-of-the-art Large Language Models on this task in both fine-tuning and zero-shot settings.<n>We additionally propose a novel pipeline called $T3$(Text-Tuple-Table) to improve their performances.
arXiv Detail & Related papers (2024-04-22T14:31:28Z) - Measuring Distributional Shifts in Text: The Advantage of Language
Model-Based Embeddings [11.393822909537796]
An essential part of monitoring machine learning models in production is measuring input and output data drift.
Recent advancements in large language models (LLMs) indicate their effectiveness in capturing semantic relationships.
We propose a clustering-based algorithm for measuring distributional shifts in text data by exploiting such embeddings.
arXiv Detail & Related papers (2023-12-04T20:46:48Z) - Fast and Accurate Factual Inconsistency Detection Over Long Documents [19.86348214462828]
We introduce SCALE, a task-agnostic model for detecting factual inconsistencies using a novel chunking strategy.
This approach achieves state-of-the-art performance in factual inconsistency detection for diverse tasks and long inputs.
We have released our code and data publicly to GitHub.
arXiv Detail & Related papers (2023-10-19T22:55:39Z) - RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting [11.306772273707253]
Large Language Models (LLMs) have demonstrated impressive capabilities in creative tasks such as storytelling and E-mail generation.
We develop new strategies for instruction tuning and reinforcement learning to better align LLMs for cross-sentence rewriting tasks.
OpenRewriteEval, a novel benchmark covers a wide variety of rewriting types expressed through natural language instructions.
arXiv Detail & Related papers (2023-05-25T03:26:26Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - Summ^N: A Multi-Stage Summarization Framework for Long Input Dialogues
and Documents [13.755637074366813]
SummN is a simple, flexible, and effective multi-stage framework for input texts longer than the maximum context lengths of typical pretrained LMs.
It can process input text of arbitrary length by adjusting the number of stages while keeping the LM context size fixed.
Our experiments demonstrate that SummN significantly outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2021-10-16T06:19:54Z) - DocNLI: A Large-scale Dataset for Document-level Natural Language
Inference [55.868482696821815]
Natural language inference (NLI) is formulated as a unified framework for solving various NLP problems.
This work presents DocNLI -- a newly-constructed large-scale dataset for document-level NLI.
arXiv Detail & Related papers (2021-06-17T13:02:26Z) - Go Forth and Prosper: Language Modeling with Ancient Textual History [54.99143450580711]
We learn an auxiliary function to select spans from the ancient history which can help the LM to predict future text.
The selected text spans are then copied directly into the LM's context window, replacing less predictive spans.
We see a 7 percent perplexity reduction on Wikipedia articles, and a 12 percent perplexity reduction on scientific texts.
arXiv Detail & Related papers (2021-04-18T06:57:30Z) - Universal Natural Language Processing with Limited Annotations: Try
Few-shot Textual Entailment as a Start [125.23550801424328]
Universal Few-shot textual Entailment (UFO-Entail)
We introduce Universal Few-shot textual Entailment (UFO-Entail)
We demonstrate that this framework enables a pretrained entailment model to work well on new entailment domains in a few-shot setting.
arXiv Detail & Related papers (2020-10-06T09:50:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.