Physical Derivatives: Computing policy gradients by physical
forward-propagation
- URL: http://arxiv.org/abs/2201.05830v1
- Date: Sat, 15 Jan 2022 11:27:42 GMT
- Title: Physical Derivatives: Computing policy gradients by physical
forward-propagation
- Authors: Arash Mehrjou, Ashkan Soleymani, Stefan Bauer, Bernhard Sch\"olkopf
- Abstract summary: Learning a good policy without a dynamic model can be prohibitively expensive.
We propose a middle ground where instead of the transition model, the sensitivity of the trajectories with respect to the perturbation of the parameters is learned.
This allows us to predict the local behavior of the physical system around a set of nominal policies without knowing the actual model.
- Score: 28.29279610522437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-free and model-based reinforcement learning are two ends of a spectrum.
Learning a good policy without a dynamic model can be prohibitively expensive.
Learning the dynamic model of a system can reduce the cost of learning the
policy, but it can also introduce bias if it is not accurate. We propose a
middle ground where instead of the transition model, the sensitivity of the
trajectories with respect to the perturbation of the parameters is learned.
This allows us to predict the local behavior of the physical system around a
set of nominal policies without knowing the actual model. We assay our method
on a custom-built physical robot in extensive experiments and show the
feasibility of the approach in practice. We investigate potential challenges
when applying our method to physical systems and propose solutions to each of
them.
Related papers
- Dynamic Manipulation of Deformable Objects in 3D: Simulation, Benchmark and Learning Strategy [88.8665000676562]
Prior methods often simplify the problem to low-speed or 2D settings, limiting their applicability to real-world 3D tasks.<n>To mitigate data scarcity, we introduce a novel simulation framework and benchmark grounded in reduced-order dynamics.<n>We propose Dynamics Informed Diffusion Policy (DIDP), a framework that integrates imitation pretraining with physics-informed test-time adaptation.
arXiv Detail & Related papers (2025-05-23T03:28:25Z) - Differentiable Information Enhanced Model-Based Reinforcement Learning [48.820039382764]
Differentiable environments have heralded new possibilities for learning control policies by offering rich differentiable information.
Model-based reinforcement learning (MBRL) methods exhibit the potential to effectively harness the power of differentiable information for recovering the underlying physical dynamics.
However, this presents two primary challenges: effectively utilizing differentiable information to 1) construct models with more accurate dynamic prediction and 2) enhance the stability of policy training.
arXiv Detail & Related papers (2025-03-03T04:51:40Z) - ICODE: Modeling Dynamical Systems with Extrinsic Input Information [14.521146920900316]
We introduce emphInput Concomitant Neural ODEs (ICODEs), which incorporate precise real-time input information into the learning process of the models.
We validate our method through experiments on several representative real dynamics.
This work offers a valuable class of neural ODE models for understanding physical systems with explicit external input information.
arXiv Detail & Related papers (2024-11-21T07:57:59Z) - Learning Low-Dimensional Strain Models of Soft Robots by Looking at the Evolution of Their Shape with Application to Model-Based Control [2.058941610795796]
This paper introduces a streamlined method for learning low-dimensional, physics-based models.
We validate our approach through simulations with various planar soft manipulators.
Thanks to the capability of the method of generating physically compatible models, the learned models can be straightforwardly combined with model-based control policies.
arXiv Detail & Related papers (2024-10-31T18:37:22Z) - Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems [49.11170948406405]
We propose an unsupervised method to estimate the physical parameters of known, continuous governing equations from single videos.
We take the field closer to reality by recording Delfys75: our own real-world dataset of 75 videos for five different types of dynamical systems.
arXiv Detail & Related papers (2024-10-02T09:44:54Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Dual policy as self-model for planning [71.73710074424511]
We refer to the model used to simulate one's decisions as the agent's self-model.
Inspired by current reinforcement learning approaches and neuroscience, we explore the benefits and limitations of using a distilled policy network as the self-model.
arXiv Detail & Related papers (2023-06-07T13:58:45Z) - Learning Neural Constitutive Laws From Motion Observations for
Generalizable PDE Dynamics [97.38308257547186]
Many NN approaches learn an end-to-end model that implicitly models both the governing PDE and material models.
We argue that the governing PDEs are often well-known and should be explicitly enforced rather than learned.
We introduce a new framework termed "Neural Constitutive Laws" (NCLaw) which utilizes a network architecture that strictly guarantees standard priors.
arXiv Detail & Related papers (2023-04-27T17:42:24Z) - Model-Based Reinforcement Learning with SINDy [0.0]
We propose a novel method for discovering the governing non-linear dynamics of physical systems in reinforcement learning (RL)
We establish that this method is capable of discovering the underlying dynamics using significantly fewer trajectories than state of the art model learning algorithms.
arXiv Detail & Related papers (2022-08-30T19:03:48Z) - Discrepancy Modeling Framework: Learning missing physics, modeling
systematic residuals, and disambiguating between deterministic and random
effects [4.459306403129608]
In modern dynamical systems, discrepancies between model and measurement can lead to poor quantification.
We introduce a discrepancy modeling framework to identify the missing physics and resolve the model-measurement mismatch.
arXiv Detail & Related papers (2022-03-10T05:37:24Z) - Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods.
By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning.
Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
arXiv Detail & Related papers (2021-10-27T04:27:28Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.