Related papers: Trajectory Entropy Reinforcement Learning for Predictable and Robust Control

Trajectory Entropy Reinforcement Learning for Predictable and Robust Control

URL: http://arxiv.org/abs/2505.04193v1
Date: Wed, 07 May 2025 07:41:29 GMT
Title: Trajectory Entropy Reinforcement Learning for Predictable and Robust Control
Authors: Bang You, Chenxu Wang, Huaping Liu,
Abstract summary: We introduce a novel inductive bias towards simple policies in reinforcement learning.<n>The simplicity inductive bias is introduced by minimizing the entropy of entire action trajectories.<n>We show that our learned policies produce more cyclical and consistent action trajectories.
Score: 12.289021814766539
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Simplicity is a critical inductive bias for designing data-driven controllers, especially when robustness is important. Despite the impressive results of deep reinforcement learning in complex control tasks, it is prone to capturing intricate and spurious correlations between observations and actions, leading to failure under slight perturbations to the environment. To tackle this problem, in this work we introduce a novel inductive bias towards simple policies in reinforcement learning. The simplicity inductive bias is introduced by minimizing the entropy of entire action trajectories, corresponding to the number of bits required to describe information in action trajectories after the agent observes state trajectories. Our reinforcement learning agent, Trajectory Entropy Reinforcement Learning, is optimized to minimize the trajectory entropy while maximizing rewards. We show that the trajectory entropy can be effectively estimated by learning a variational parameterized action prediction model, and use the prediction model to construct an information-regularized reward function. Furthermore, we construct a practical algorithm that enables the joint optimization of models, including the policy and the prediction model. Experimental evaluations on several high-dimensional locomotion tasks show that our learned policies produce more cyclical and consistent action trajectories, and achieve superior performance, and robustness to noise and dynamic changes than the state-of-the-art.

Related papers

GoIRL: Graph-Oriented Inverse Reinforcement Learning for Multimodal Trajectory Prediction [35.36975133932852]
Trajectory prediction for surrounding agents is a challenging task in autonomous driving.<n>We introduce a novel Graph-oriented Inverse Reinforcement Learning framework, which is an IRL-based predictor equipped with vectorized context representations.<n>Our approach achieves state-of-the-art performance on the large-scale Argoverse & nuScenes motion forecasting benchmarks.
arXiv Detail & Related papers (2025-06-26T09:46:53Z)
ActivePusher: Active Learning and Planning with Residual Physics for Nonprehensile Manipulation [2.7405276609125164]
Planning with learned dynamics models offers a promising approach toward real-world, long-horizon manipulation.<n>ActivePusher is a framework that combines residual-physics modeling with kernel-based uncertainty-driven active learning.<n>We evaluate our approach in both simulation and real-world environments and demonstrate that it improves data efficiency and planning success rates compared to baseline methods.
arXiv Detail & Related papers (2025-06-05T05:28:14Z)
Maximum Total Correlation Reinforcement Learning [23.209609715886454]
We introduce a modification of the reinforcement learning problem that additionally maximizes the total correlation within the induced trajectories.<n>In simulated robot environments, our method naturally generates policies that induce periodic and compressible trajectories.
arXiv Detail & Related papers (2025-05-22T14:48:00Z)
DeepONet as a Multi-Operator Extrapolation Model: Distributed Pretraining with Physics-Informed Fine-Tuning [6.635683993472882]
We propose a novel fine-tuning method to achieve multi-operator learning. Our approach combines distributed learning to integrate data from various operators in pre-training, while physics-informed methods enable zero-shot fine-tuning.
arXiv Detail & Related papers (2024-11-11T18:58:46Z)
Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity [51.40558987254471]
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations. This paper addresses the question of reinforcement learning under $textitgeneral$ latent dynamics from a statistical and algorithmic perspective.
arXiv Detail & Related papers (2024-10-23T14:22:49Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
Interpretable Interaction Modeling for Trajectory Prediction via Agent Selection and Physical Coefficient [1.6954753390775528]
We present ASPILin, which manually selects interacting agents and replaces the attention scores in Transformer with a newly computed physical correlation coefficient.<n>Surprisingly, these simple modifications can significantly improve prediction performance and substantially reduce computational costs.
arXiv Detail & Related papers (2024-05-21T18:45:18Z)
DTC: Deep Tracking Control [16.2850135844455]
We propose a hybrid control architecture that combines the advantages of both worlds to achieve greater robustness, foot-placement accuracy, and terrain generalization. A deep neural network policy is trained in simulation, aiming to track the optimized footholds. We demonstrate superior robustness in the presence of slippery or deformable ground when compared to model-based counterparts.
arXiv Detail & Related papers (2023-09-27T07:57:37Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
Physics-Inspired Temporal Learning of Quadrotor Dynamics for Accurate Model Predictive Trajectory Tracking [76.27433308688592]
Accurately modeling quadrotor's system dynamics is critical for guaranteeing agile, safe, and stable navigation. We present a novel Physics-Inspired Temporal Convolutional Network (PI-TCN) approach to learning quadrotor's system dynamics purely from robot experience. Our approach combines the expressive power of sparse temporal convolutions and dense feed-forward connections to make accurate system predictions.
arXiv Detail & Related papers (2022-06-07T13:51:35Z)
Online reinforcement learning with sparse rewards through an active inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future. Our model is capable of solving sparse-reward problems with a very high sample efficiency. We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z)
Sample-efficient reinforcement learning using deep Gaussian processes [18.044018772331636]
Reinforcement learning provides a framework for learning to control which actions to take towards completing a task through trial-and-error. In model-based reinforcement learning efficiency is improved by learning to simulate the world dynamics. We introduce deep Gaussian processes where the depth of the compositions introduces model complexity while incorporating prior knowledge on the dynamics brings smoothness and structure.
arXiv Detail & Related papers (2020-11-02T13:37:57Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Goal-Directed Planning for Habituated Agents by Active Inference Using a Variational Recurrent Neural Network [5.000272778136268]
This study shows that the predictive coding (PC) and active inference (AIF) frameworks can develop better generalization by learning a prior distribution in a low dimensional latent state space. In our proposed model, learning is carried out by inferring optimal latent variables as well as synaptic weights for maximizing the evidence lower bound. Our proposed model was evaluated with both simple and complex robotic tasks in simulation, which demonstrated sufficient generalization in learning with limited training data.
arXiv Detail & Related papers (2020-05-27T06:43:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.