Related papers: Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CMDPs

Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CMDPs

URL: http://arxiv.org/abs/2510.01620v2
Date: Fri, 03 Oct 2025 02:17:40 GMT
Title: Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CMDPs
Authors: Peidong Liu, Junjiang Lin, Shaowen Wang, Yao Xu, Haiqing Li, Xuhao Xie, Siyi Wu, Hao Li,
Abstract summary: Contextual Markov Decision Processes (CMDPs) offer a framework for sequential decision-making under external signals.<n>We propose an information-theoretic summarization approach that uses large language models (LLMs) to compress contextual inputs into low-dimensional, semantically rich summaries.
Score: 23.111877248835736
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contextual Markov Decision Processes (CMDPs) offer a framework for sequential decision-making under external signals, but existing methods often fail to generalize in high-dimensional or unstructured contexts, resulting in excessive computation and unstable performance. We propose an information-theoretic summarization approach that uses large language models (LLMs) to compress contextual inputs into low-dimensional, semantically rich summaries. These summaries augment states by preserving decision-critical cues while reducing redundancy. Building on the notion of approximate context sufficiency, we provide, to our knowledge, the first regret bounds and a latency-entropy trade-off characterization for CMDPs. Our analysis clarifies how informativeness impacts computational cost. Experiments across discrete, continuous, visual, and recommendation benchmarks show that our method outperforms raw-context and non-context baselines, improving reward, success rate, and sample efficiency, while reducing latency and memory usage. These findings demonstrate that LLM-based summarization offers a scalable and interpretable solution for efficient decision-making in context-rich, resource-constrained environments.

Related papers

C-IDS: Solving Contextual POMDP via Information-Directed Objective [10.82202704907442]
We study the policy synthesis problem in contextual partially observable Markov decision processes.<n>Our goal is to design a policy that simultaneously maximizes cumulative return and actively reduces uncertainty about the underlying context.<n>We develop the C-IDS algorithm to synthesize policies that maximize the information-directed objective.
arXiv Detail & Related papers (2026-02-03T19:00:34Z)
Beyond RAG vs. Long-Context: Learning Distraction-Aware Retrieval for Efficient Knowledge Grounding [5.353135097018941]
Retrieval-Augmented Generation (RAG) is a framework for grounding Large Language Models (LLMs) in external, up-to-date information.<n>We propose LDAR (Learning Distraction-Aware Retrieval), an adaptive retriever that learns to retrieve contexts in a way that mitigates interference from distracting passages.
arXiv Detail & Related papers (2025-09-26T04:40:42Z)
Implicit Reasoning in Large Language Models: A Comprehensive Survey [67.53966514728383]
Large Language Models (LLMs) have demonstrated strong generalization across a wide range of tasks.<n>Recent studies have shifted attention from explicit chain-of-thought prompting toward implicit reasoning.<n>This survey introduces a taxonomy centered on execution paradigms, shifting the focus from representational forms to computational strategies.
arXiv Detail & Related papers (2025-09-02T14:16:02Z)
PixelThink: Towards Efficient Chain-of-Pixel Reasoning [70.32510083790069]
PixelThink is a simple yet effective scheme that integrates externally estimated task difficulty and internally measured model uncertainty.<n>It learns to compress reasoning length in accordance with scene complexity and predictive confidence.<n> Experimental results demonstrate that the proposed approach improves both reasoning efficiency and overall segmentation performance.
arXiv Detail & Related papers (2025-05-29T17:55:49Z)
QUITO-X: A New Perspective on Context Compression from the Information Bottleneck Theory [66.01597794579568]
We introduce information bottleneck theory (IB) to model the problem.<n>We propose a cross-attention-based approach to approximate mutual information in IB.<n>Our method achieves a 25% increase in compression rate compared to the state-of-the-art.
arXiv Detail & Related papers (2024-08-20T02:44:45Z)
Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance. Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z)
Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation [17.156915103545728]
Large Language Models (LLMs) have made significant strides in information acquisition. Retrieval Augmented Generation (RAG) addresses this limitation by incorporating external, non-parametric knowledge. We propose a novel concept-based RAG framework with the Abstract Representation (AMR)-based concept distillation algorithm.
arXiv Detail & Related papers (2024-05-06T00:18:43Z)
Adapting LLMs for Efficient Context Processing through Soft Prompt Compression [1.1550486371582305]
This article presents an innovative framework that strategically tailors Large Language Models for streamlined context processing. Our methodology, dubbed SoftPromptComp, amalgamates natural language prompts with dynamically generated soft prompts to forge a concise yet semantically robust depiction of protracted contexts. We substantiate that our framework markedly diminishes computational overhead and enhances LLMs' efficacy across various benchmarks.
arXiv Detail & Related papers (2024-04-07T15:44:20Z)
Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic [51.967603572656266]
We introduce a consistent and theoretically grounded approach to annotating decompositional entailment. We find that our new dataset, RDTE, has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets. We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in an entailment tree reasoning engine significantly improves both accuracy and proof quality.
arXiv Detail & Related papers (2024-02-22T18:55:17Z)
From Contextual Data to Newsvendor Decisions: On the Actual Performance of Data-Driven Algorithms [8.714718004930363]
We study how the relevance and quantity of past data affects the performance of a data-driven policy.<n>We consider a setting in which past demands observed under close by'' contexts come from close by distributions.
arXiv Detail & Related papers (2023-02-16T17:03:39Z)
Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z)
Leveraging Unlabeled Data for Entity-Relation Extraction through Probabilistic Constraint Satisfaction [54.06292969184476]
We study the problem of entity-relation extraction in the presence of symbolic domain knowledge. Our approach employs semantic loss which captures the precise meaning of a logical sentence. With a focus on low-data regimes, we show that semantic loss outperforms the baselines by a wide margin.
arXiv Detail & Related papers (2021-03-20T00:16:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.