Related papers: DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning

DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning

URL: http://arxiv.org/abs/2402.05421v3
Date: Thu, 31 Oct 2024 04:53:19 GMT
Title: DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning
Authors: Weikang Wan, Ziyu Wang, Yufei Wang, Zackory Erickson, David Held,
Abstract summary: This paper introduces DiffTORI, which utilizes Differentiable Trajectory optimization as the policy representation to generate actions for deep Reinforcement and Imitation learning. Across 15 model-based RL tasks and 35 imitation learning tasks with high-dimensional image and point cloud inputs, DiffTORI outperforms prior state-of-the-art methods in both domains.
Score: 19.84386060857712
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces DiffTORI, which utilizes Differentiable Trajectory Optimization as the policy representation to generate actions for deep Reinforcement and Imitation learning. Trajectory optimization is a powerful and widely used algorithm in control, parameterized by a cost and a dynamics function. The key to our approach is to leverage the recent progress in differentiable trajectory optimization, which enables computing the gradients of the loss with respect to the parameters of trajectory optimization. As a result, the cost and dynamics functions of trajectory optimization can be learned end-to-end. DiffTORI addresses the ``objective mismatch'' issue of prior model-based RL algorithms, as the dynamics model in DiffTORI is learned to directly maximize task performance by differentiating the policy gradient loss through the trajectory optimization process. We further benchmark DiffTORI for imitation learning on standard robotic manipulation task suites with high-dimensional sensory observations and compare our method to feed-forward policy classes as well as Energy-Based Models (EBM) and Diffusion. Across 15 model-based RL tasks and 35 imitation learning tasks with high-dimensional image and point cloud inputs, DiffTORI outperforms prior state-of-the-art methods in both domains.

Related papers

Learning Gradient Flow: Using Equation Discovery to Accelerate Engineering Optimization [0.0]
We use trajectory data to learn the continuous-time dynamics associated with gradient descent, Newton's method, and ADAM optimization.<n>The discovered gradient is then solved as a surrogate for the original optimization problem.<n>We demonstrate the efficacy of this approach on several standard problems from engineering mechanics and scientific machine learning.
arXiv Detail & Related papers (2026-02-13T22:44:33Z)
Flows and Diffusions on the Neural Manifold [0.0]
Diffusion and flow-based generative models have achieved remarkable success in domains such as image synthesis, video generation, and natural language modeling.<n>We extend these advances to weight space learning by leveraging recent techniques to incorporate structural priors derived from optimization dynamics.
arXiv Detail & Related papers (2025-07-14T02:26:06Z)
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development. We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents. We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z)
Unifying Model Predictive Path Integral Control, Reinforcement Learning, and Diffusion Models for Optimal Control and Planning [6.871390204787483]
We establish a unified perspective that connects MPPI, RL, and Diffusion Models through gradient-based optimization on the Gibbs measure. We first show that MPPI can be interpreted as performing gradient ascent on a smoothed energy function. We then demonstrate that Policy Gradient methods reduce to MPPI by applying an exponential transformation to the objective function.
arXiv Detail & Related papers (2025-02-27T19:26:36Z)
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z)
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives. We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z)
Trajectory-Based Multi-Objective Hyperparameter Optimization for Model Retraining [8.598456741786801]
We present a novel trajectory-based multi-objective Bayesian optimization algorithm. Our algorithm outperforms the state-of-the-art multi-objectives in both locating better trade-offs and tuning efficiency.
arXiv Detail & Related papers (2024-05-24T07:43:45Z)
Model-based Reinforcement Learning for Parameterized Action Spaces [11.94388805327713]
We propose a novel model-based reinforcement learning algorithm for PAMDPs. The agent learns a parameterized-action-conditioned dynamics model and plans with a modified Model Predictive Path Integral control. Our empirical results on several standard benchmarks show that our algorithm achieves superior sample efficiency and performance than state-of-the-art PAMDP methods.
arXiv Detail & Related papers (2024-04-03T19:48:13Z)
LeTO: Learning Constrained Visuomotor Policy with Differentiable Trajectory Optimization [1.1602089225841634]
This paper introduces LeTO, a method for learning constrained visuomotor policy with differentiable trajectory optimization. We quantitatively evaluate LeTO in simulation and in the real robot.
arXiv Detail & Related papers (2024-01-30T23:18:35Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
Gradient-Based Trajectory Optimization With Learned Dynamics [80.41791191022139]
We use machine learning techniques to learn a differentiable dynamics model of the system from data. We show that a neural network can model highly nonlinear behaviors accurately for large time horizons. In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car.
arXiv Detail & Related papers (2022-04-09T22:07:34Z)
Data Augmentation through Expert-guided Symmetry Detection to Improve Performance in Offline Reinforcement Learning [0.0]
offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task. Recent works showed that an expert-guided pipeline relying on Density Estimation methods effectively detects this structure in deterministic environments. We show that the former results lead to a performance improvement when solving the learned MDP and then applying the optimized policy in the real environment.
arXiv Detail & Related papers (2021-12-18T14:32:32Z)
A Differential Game Theoretic Neural Optimizer for Training Residual Networks [29.82841891919951]
We propose a generalized Differential Dynamic Programming (DDP) neural architecture that accepts both residual connections and convolution layers. The resulting optimal control representation admits a gameoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented systems.
arXiv Detail & Related papers (2020-07-17T10:19:17Z)
Optimization-driven Deep Reinforcement Learning for Robust Beamforming in IRS-assisted Wireless Communications [54.610318402371185]
Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver. We minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming. We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences.
arXiv Detail & Related papers (2020-05-25T01:42:55Z)
Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator. We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.