Confounding Robust Continuous Control via Automatic Reward Shaping
- URL: http://arxiv.org/abs/2602.10305v1
- Date: Tue, 10 Feb 2026 21:23:12 GMT
- Title: Confounding Robust Continuous Control via Automatic Reward Shaping
- Authors: Mateo Juliani, Mingxuan Li, Elias Bareinboim,
- Abstract summary: We propose to automatically learn a reward shaping function for continuous control problems from offline datasets.<n>Our method builds upon the recently proposed causal Bellman equation to learn a tight upper bound on the optimal state values.<n>Our work marks a solid first step towards confounding robust continuous control from a causal perspective.
- Score: 48.93769483870838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reward shaping has been applied widely to accelerate Reinforcement Learning (RL) agents' training. However, a principled way of designing effective reward shaping functions, especially for complex continuous control problems, remains largely under-explained. In this work, we propose to automatically learn a reward shaping function for continuous control problems from offline datasets, potentially contaminated by unobserved confounding variables. Specifically, our method builds upon the recently proposed causal Bellman equation to learn a tight upper bound on the optimal state values, which is then used as the potentials in the Potential-Based Reward Shaping (PBRS) framework. Our proposed reward shaping algorithm is tested with Soft-Actor-Critic (SAC) on multiple commonly used continuous control benchmarks and exhibits strong performance guarantees under unobserved confounders. More broadly, our work marks a solid first step towards confounding robust continuous control from a causal perspective. Code for training our reward shaping functions can be found at https://github.com/mateojuliani/confounding_robust_cont_control.
Related papers
- GB-DQN: Gradient Boosted DQN Models for Non-stationary Reinforcement Learning [0.0]
We propose emphGradient-Boosted Deep Q-Networks (GB-DQN), an adaptive ensemble method that addresses model drift through incremental residual learning.<n>Instead of retraining a single Q-network, GB-DQN constructs an additive ensemble in which each new learner is trained to approximate the Bellman residual of the current ensemble after drift.
arXiv Detail & Related papers (2025-12-18T19:53:50Z) - HINT: Helping Ineffective Rollouts Navigate Towards Effectiveness [49.72591739116668]
Reinforcement Learning (RL) has become a key driver for enhancing the long chain-of-thought (CoT) reasoning capabilities of Large Language Models (LLMs)<n>However, prevalent methods like GRPO often fail when task difficulty exceeds the model's capacity, leading to reward sparsity and inefficient training.<n>We propose HINT: Helping Ineffective rollouts Navigate Towards effectiveness, an adaptive hinting framework.
arXiv Detail & Related papers (2025-10-10T13:42:03Z) - Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - Constrained Reinforcement Learning with Smoothed Log Barrier Function [27.216122901635018]
We propose a new constrained RL method called CSAC-LB (Constrained Soft Actor-Critic with Log Barrier Function)
It achieves competitive performance without any pre-training by applying a linear smoothed log barrier function to an additional safety critic.
We show that with CSAC-LB, we achieve state-of-the-art performance on several constrained control tasks with different levels of difficulty.
arXiv Detail & Related papers (2024-03-21T16:02:52Z) - Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function.
Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z) - Dealing with Sparse Rewards in Continuous Control Robotics via
Heavy-Tailed Policies [64.2210390071609]
We present a novel Heavy-Tailed Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems.
We show consistent performance improvement across all tasks in terms of high average cumulative reward.
arXiv Detail & Related papers (2022-06-12T04:09:39Z) - Deep Q-learning: a robust control approach [4.125187280299247]
We formulate an uncertain linear time-invariant model by means of the neural tangent kernel to describe learning.
We show the instability of learning and analyze the agent's behavior in frequency-domain.
Numerical simulations in different OpenAI Gym environments suggest that the $mathcalH_infty$ controlled learning performs slightly better than Double deep Q-learning.
arXiv Detail & Related papers (2022-01-21T09:47:34Z) - Simplifying Deep Reinforcement Learning via Self-Supervision [51.2400839966489]
Self-Supervised Reinforcement Learning (SSRL) is a simple algorithm that optimize policies with purely supervised losses.
We show that SSRL is surprisingly competitive to contemporary algorithms with more stable performance and less running time.
arXiv Detail & Related papers (2021-06-10T06:29:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.