Related papers: Recursive Language Models

Recursive Language Models

URL: http://arxiv.org/abs/2512.24601v1
Date: Wed, 31 Dec 2025 03:43:41 GMT
Title: Recursive Language Models
Authors: Alex L. Zhang, Tim Kraska, Omar Khattab,
Abstract summary: We propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as an external environment.<n>We find that RLMs successfully handle two orders of magnitude beyond model windows and, even for shorter prompts, dramatically outperform the quality of base LLMs.
Score: 14.17788048231183
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inference strategy that treats long prompts as part of an external environment and allows the LLM to programmatically examine, decompose, and recursively call itself over snippets of the prompt. We find that RLMs successfully handle inputs up to two orders of magnitude beyond model context windows and, even for shorter prompts, dramatically outperform the quality of base LLMs and common long-context scaffolds across four diverse long-context tasks, while having comparable (or cheaper) cost per query.

Related papers

Are Prompts All You Need? Evaluating Prompt-Based Large Language Models (LLM)s for Software Requirements Classification [1.1458853556386799]
This study tests whether prompt based large language models can reduce data needs.<n>We benchmark several models and prompting styles across multiple tasks on two English datasets, PROMISE and SecReq.
arXiv Detail & Related papers (2025-09-17T09:58:26Z)
A Comprehensive Survey on Long Context Language Modeling [118.5540791080351]
Long Context Language Models (LCLMs) process and analyze extensive inputs in an effective and efficient way.<n>Our survey is structured around three key aspects: how to obtain effective and efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate and analyze LCLMs comprehensively.
arXiv Detail & Related papers (2025-03-20T17:06:28Z)
LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models [73.13933847198395]
We propose a training-free framework for processing long texts, utilizing a divide-and-conquer strategy to achieve comprehensive document understanding. The proposed LLM$times$MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate answers to produce the final output.
arXiv Detail & Related papers (2024-10-12T03:13:44Z)
Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts [5.520335305387487]
We propose a novel prompting strategy Multi-Lingual Prompt, namely MLPrompt.<n> MLPrompt translates the error-prone rule that an LLM struggles to follow into another language, thus drawing greater attention to it.<n>We introduce a framework integrating MLPrompt with an auto-checking mechanism for structured data generation, with a specific case study in text-to-MIP instances.
arXiv Detail & Related papers (2024-09-17T10:33:27Z)
DoubleDipper: Improving Long-Context LLMs via Context Recycling [44.24067814871803]
We propose DoubleDipper, a novel In-Context-Learning method that automatically generates few-shot examples for long context QA tasks.<n>We apply our method on multiple Large Language Models and obtain substantial improvements.<n>Surprisingly, despite introducing only single-hop ICL examples, LLMs successfully generalize to multi-hop long-context QA.
arXiv Detail & Related papers (2024-06-19T15:28:29Z)
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z)
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks [76.43527940649939]
We introduce Ada-LEval, a benchmark for evaluating the long-context understanding of large language models (LLMs) Ada-LEval includes two challenging subsets, TSort and BestAnswer, which enable a more reliable evaluation of LLMs' long context capabilities. We evaluate 4 state-of-the-art closed-source API models and 6 open-source models with Ada-LEval.
arXiv Detail & Related papers (2024-04-09T17:30:48Z)
Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization [7.674972936853123]
We investigate whether combining the queries for the same input context in a single prompt to minimize repeated calls can be successfully used in meeting summarization. We observe that 100% reliability in generating the response in the expected format is usually limited to certain closed-source LLMs.
arXiv Detail & Related papers (2024-02-29T19:00:47Z)
LooGLE: Can Long-Context Language Models Understand Long Contexts? [46.143956498529796]
LooGLE is a benchmark for large language models' long context understanding. It features relatively new documents post-2022, with over 24,000 tokens per document and 6,000 newly generated questions spanning diverse domains. The evaluation of eight state-of-the-art LLMs on LooGLE revealed key findings.
arXiv Detail & Related papers (2023-11-08T01:45:37Z)
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models [30.48902594738911]
Given a long conversation, large language models (LLMs) fail to recall past information and tend to generate inconsistent responses.<n>We propose to generate summaries/ memory using large language models (LLMs) to enhance long-term memory ability.
arXiv Detail & Related papers (2023-08-29T04:59:53Z)
Large Language Models are Strong Zero-Shot Retriever [89.16756291653371]
We propose a simple method that applies a large language model (LLM) to large-scale retrieval in zero-shot scenarios. Our method, the Language language model as Retriever (LameR), is built upon no other neural models but an LLM.
arXiv Detail & Related papers (2023-04-27T14:45:55Z)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.