Related papers: In-Context Compositional Q-Learning for Offline Reinforcement Learning

In-Context Compositional Q-Learning for Offline Reinforcement Learning

URL: http://arxiv.org/abs/2509.24067v1
Date: Sun, 28 Sep 2025 20:55:21 GMT
Title: In-Context Compositional Q-Learning for Offline Reinforcement Learning
Authors: Qiushui Xu, Yuhao Huang, Yushu Jiang, Lei Song, Jinyu Wang, Wenliang Zheng, Jiang Bian,
Abstract summary: We propose In-context Compositional Q-Learning (textttICQL), the first offline RL framework that formulates Q-learning as a contextual inference problem.<n>We show that under two assumptions--linear approximability of the local Q-function and accurate weight inference--textttICQL achieves bounded Q-function approximation error.<n> Empirically, textttICQL substantially improves performance in offline settings.
Score: 21.45273716299317
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurately estimating the Q-function is a central challenge in offline reinforcement learning. However, existing approaches often rely on a single global Q-function, which struggles to capture the compositional nature of tasks involving diverse subtasks. We propose In-context Compositional Q-Learning (\texttt{ICQL}), the first offline RL framework that formulates Q-learning as a contextual inference problem, using linear Transformers to adaptively infer local Q-functions from retrieved transitions without explicit subtask labels. Theoretically, we show that under two assumptions--linear approximability of the local Q-function and accurate weight inference from retrieved context--\texttt{ICQL} achieves bounded Q-function approximation error, and supports near-optimal policy extraction. Empirically, \texttt{ICQL} substantially improves performance in offline settings: improving performance in kitchen tasks by up to 16.4\%, and in Gym and Adroit tasks by up to 8.6\% and 6.3\%. These results highlight the underexplored potential of in-context learning for robust and compositional value estimation, positioning \texttt{ICQL} as a principled and effective framework for offline RL.

Related papers

Q-learning with Adjoint Matching [58.78551025170267]
We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm.<n>QAM sidesteps two challenges by leveraging adjoint matching, a recently proposed technique in generative modeling.<n>It consistently outperforms prior approaches on hard, sparse reward tasks in both offline and offline-to-online RL.
arXiv Detail & Related papers (2026-01-20T18:45:34Z)
BiCQL-ML: A Bi-Level Conservative Q-Learning Framework for Maximum Likelihood Inverse Reinforcement Learning [10.016258281947122]
BiCQL-ML is a policy-free offline inverse reinforcement learning algorithm.<n>We show BiCQL-ML improves both reward recovery and downstream policy performance.
arXiv Detail & Related papers (2025-11-27T08:27:10Z)
Continual Action Quality Assessment via Adaptive Manifold-Aligned Graph Regularization [53.82400605816587]
Action Quality Assessment (AQA) quantifies human actions in videos, supporting applications in sports scoring, rehabilitation, and skill evaluation.<n>A major challenge lies in the non-stationary nature of quality distributions in real-world scenarios.<n>We introduce Continual AQA (CAQA), which equips with Continual Learning capabilities to handle evolving distributions.
arXiv Detail & Related papers (2025-10-08T10:09:47Z)
Scalable In-Context Q-Learning [68.9917436397079]
We propose textbfScalable textbfIn-textbfContext textbfQ-textbfLearning (textbfSICQL) to steer in-context reinforcement learning.<n>textbfSICQL harnesses dynamic programming and world modeling to steer ICRL toward efficient reward and task generalization.
arXiv Detail & Related papers (2025-06-02T04:21:56Z)
Q-function Decomposition with Intervention Semantics with Factored Action Spaces [51.01244229483353]
We consider Q-functions defined over a lower dimensional projected subspace of the original action space, and study the condition for the unbiasedness of decomposed Q-functions.<n>This leads to a general scheme which we call action decomposed reinforcement learning that uses the projected Q-functions to approximate the Q-function in standard model-free reinforcement learning algorithms.
arXiv Detail & Related papers (2025-04-30T05:26:51Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.<n>We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.<n>We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
ACL-QL: Adaptive Conservative Level in Q-Learning for Offline Reinforcement Learning [46.67828766038463]
We propose a framework, Adaptive Conservative Level in Q-Learning (ACL-QL), which limits the Q-values in a mild range.<n>ACL-QL enables adaptive control on the conservative level over each state-action pair, i.e., lifting the Q-values more for good transitions and less for bad transitions.<n>Motivated by the theoretical analysis, we propose a novel algorithm, ACL-QL, which uses two learnable adaptive weight functions to control the conservative level over each transition.
arXiv Detail & Related papers (2024-12-22T04:18:02Z)
VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers [7.7705926659081275]
VerifierQ is a novel approach that integrates Offline Q-learning into verifier models. We address three key challenges in applying Q-learning to LLMs. Our method enables parallel Q-value computation and improving training efficiency.
arXiv Detail & Related papers (2024-10-10T15:43:55Z)
Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results. We present novel methods Q-Rex and Q-RexDaRe. We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z)
Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return. We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.