Related papers: Observations Meet Actions: Learning Control-Sufficient Representations for Robust Policy Generalization

Observations Meet Actions: Learning Control-Sufficient Representations for Robust Policy Generalization

URL: http://arxiv.org/abs/2507.19437v1
Date: Fri, 25 Jul 2025 17:08:16 GMT
Title: Observations Meet Actions: Learning Control-Sufficient Representations for Robust Policy Generalization
Authors: Yuliang Gu, Hongpeng Cao, Marco Caccamo, Naira Hovakimyan,
Abstract summary: Capturing latent variations ("contexts") is key to deploying reinforcement-learning (RL) agents beyond their training regime.<n>We recast context-based RL as a dual inference-control problem and formally characterize two properties and their hierarchy.<n>We derive a contextual evidence lower bound(ELBO)-style objective that cleanly separates representation learning from policy learning.
Score: 6.408943565801689
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Capturing latent variations ("contexts") is key to deploying reinforcement-learning (RL) agents beyond their training regime. We recast context-based RL as a dual inference-control problem and formally characterize two properties and their hierarchy: observation sufficiency (preserving all predictive information) and control sufficiency (retaining decision-making relevant information). Exploiting this dichotomy, we derive a contextual evidence lower bound(ELBO)-style objective that cleanly separates representation learning from policy learning and optimizes it with Bottlenecked Contextual Policy Optimization (BCPO), an algorithm that places a variational information-bottleneck encoder in front of any off-policy policy learner. On standard continuous-control benchmarks with shifting physical parameters, BCPO matches or surpasses other baselines while using fewer samples and retaining performance far outside the training regime. The framework unifies theory, diagnostics, and practice for context-based RL.

Related papers

Structure-Informed Deep Reinforcement Learning for Inventory Management [8.697068617006964]
This paper investigates the application of Deep Reinforcement Learning to classical inventory management problems.<n>We apply a DRL algorithm based on DirectBackprop to several fundamental inventory management scenarios.<n>We show that our generic DRL implementation performs competitively against or outperforms established benchmarks and distributions.
arXiv Detail & Related papers (2025-07-29T17:41:45Z)
Guided Policy Optimization under Partial Observability [36.853129816484845]
Reinforcement Learning (RL) in partially observable environments poses significant challenges due to the complexity of learning under uncertainty.<n>We introduce Guided Policy Optimization (GPO), a framework that co-trains a guider and a learner.<n>We theoretically demonstrate that this learning scheme achieves optimality comparable to direct RL, thereby overcoming key limitations inherent in existing approaches.
arXiv Detail & Related papers (2025-05-21T12:01:08Z)
COMBO-Grasp: Learning Constraint-Based Manipulation for Bimanual Occluded Grasping [56.907940167333656]
Occluded robot grasping is where the desired grasp poses are kinematically infeasible due to environmental constraints such as surface collisions.<n>Traditional robot manipulation approaches struggle with the complexity of non-prehensile or bimanual strategies commonly used by humans.<n>We introduce Constraint-based Manipulation for Bimanual Occluded Grasping (COMBO-Grasp), a learning-based approach which leverages two coordinated policies.
arXiv Detail & Related papers (2025-02-12T01:31:01Z)
SAMBO-RL: Shifts-aware Model-based Offline Reinforcement Learning [9.88109749688605]
Model-based offline reinforcement learning trains policies using pre-collected datasets and learned environment models.<n>This paper offers a comprehensive analysis that disentangles the problem into two fundamental components: model bias and policy shift.<n>We introduce Shifts-aware Model-based Offline Reinforcement Learning (SAMBO-RL), a practical framework that efficiently trains classifiers to approximate SAR for policy optimization.
arXiv Detail & Related papers (2024-08-23T04:25:09Z)
Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data. Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment. Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z)
Learning Interpretable Policies in Hindsight-Observable POMDPs through Partially Supervised Reinforcement Learning [57.67629402360924]
We introduce the Partially Supervised Reinforcement Learning (PSRL) framework. At the heart of PSRL is the fusion of both supervised and unsupervised learning. We show that PSRL offers a potent balance, enhancing model interpretability while preserving, and often significantly outperforming, the performance benchmarks set by traditional methods.
arXiv Detail & Related papers (2024-02-14T16:23:23Z)
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation. We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z)
Counterfactual Explanation Policies in RL [3.674863913115432]
COUNTERPOL is the first framework to analyze Reinforcement Learning policies using counterfactual explanations. We establish a theoretical connection between Counterpol and widely used trust region-based policy optimization methods in RL.
arXiv Detail & Related papers (2023-07-25T01:14:56Z)
A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z)
Contextualize Me -- The Case for Context in Reinforcement Learning [49.794253971446416]
Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner. We show how cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks.
arXiv Detail & Related papers (2022-02-09T15:01:59Z)
Model-Based Offline Meta-Reinforcement Learning with Regularization [63.35040401948943]
offline Meta-RL is emerging as a promising approach to address these challenges. MerPO learns a meta-model for efficient task structure inference and an informative meta-policy. We show that MerPO offers guaranteed improvement over both the behavior policy and the meta-policy.
arXiv Detail & Related papers (2022-02-07T04:15:20Z)
BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning [14.432131909590824]
Offline Reinforcement Learning aims to train effective policies using previously collected datasets. Standard off-policy RL algorithms are prone to overestimations of the values of out-of-distribution (less explored) actions. We improve the behavior regularized offline reinforcement learning and propose BRAC+.
arXiv Detail & Related papers (2021-10-02T23:55:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.