Related papers: Modulating Reservoir Dynamics via Reinforcement Learning for Efficient Robot Skill Synthesis

Modulating Reservoir Dynamics via Reinforcement Learning for Efficient Robot Skill Synthesis

URL: http://arxiv.org/abs/2411.10991v1
Date: Sun, 17 Nov 2024 07:25:54 GMT
Title: Modulating Reservoir Dynamics via Reinforcement Learning for Efficient Robot Skill Synthesis
Authors: Zahra Koulaeizadeh, Erhan Oztop,
Abstract summary: A random recurrent neural network, called a reservoir, can be used to learn robot movements conditioned on context inputs. In this work, we propose a novel RC-based Learning from Demonstration (LfD) framework.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A random recurrent neural network, called a reservoir, can be used to learn robot movements conditioned on context inputs that encode task goals. The Learning is achieved by mapping the random dynamics of the reservoir modulated by context to desired trajectories via linear regression. This makes the reservoir computing (RC) approach computationally efficient as no iterative gradient descent learning is needed. In this work, we propose a novel RC-based Learning from Demonstration (LfD) framework that not only learns to generate the demonstrated movements but also allows online modulation of the reservoir dynamics to generate movement trajectories that are not covered by the initial demonstration set. This is made possible by using a Reinforcement Learning (RL) module that learns a policy to output context as its actions based on the robot state. Considering that the context dimension is typically low, learning with the RL module is very efficient. We show the validity of the proposed model with systematic experiments on a 2 degrees-of-freedom (DOF) simulated robot that is taught to reach targets, encoded as context, with and without obstacle avoidance constraint. The initial data set includes a set of reaching demonstrations which are learned by the reservoir system. To enable reaching out-of-distribution targets, the RL module is engaged in learning a policy to generate dynamic contexts so that the generated trajectory achieves the desired goal without any learning in the reservoir system. Overall, the proposed model uses an initial learned motor primitive set to efficiently generate diverse motor behaviors guided by the designed reward function. Thus the model can be used as a flexible and effective LfD system where the action repertoire can be extended without new data collection.

Related papers

MoRe-ERL: Learning Motion Residuals using Episodic Reinforcement Learning [24.049065629193752]
MoRe-ERL is a framework that combines Episodic Reinforcement Learning (ERL) and residual learning.<n>MoRe-ERL identifies trajectory segments requiring modification while preserving critical task-related maneuvers.<n>It generates smooth residual adjustments using B-Spline-based movement primitives.
arXiv Detail & Related papers (2025-08-02T15:28:11Z)
Action Flow Matching for Continual Robot Learning [57.698553219660376]
Continual learning in robotics seeks systems that can constantly adapt to changing environments and tasks. We introduce a generative framework leveraging flow matching for online robot dynamics model alignment. We find that by transforming the actions themselves rather than exploring with a misaligned model, the robot collects informative data more efficiently.
arXiv Detail & Related papers (2025-04-25T16:26:15Z)
Back-stepping Experience Replay with Application to Model-free Reinforcement Learning for a Soft Snake Robot [15.005962159112002]
Back-stepping Experience Replay (BER) is compatible with arbitrary off-policy reinforcement learning algorithms. We present an application of BER in a model-free RL approach for the locomotion and navigation of a soft snake robot.
arXiv Detail & Related papers (2024-01-21T02:17:16Z)
Deep Learning for Koopman-based Dynamic Movement Primitives [0.0]
We propose a novel approach by joining the theories of Koopman Operators and Dynamic Movement Primitives to Learning from Demonstration. Our approach, named glsadmd, projects nonlinear dynamical systems into linear latent spaces such that a solution reproduces the desired complex motion. Our results are comparable to the Extended Dynamic Mode Decomposition on the LASA Handwriting dataset but with training on only a small fractions of the letters.
arXiv Detail & Related papers (2023-12-06T07:33:22Z)
Controlling dynamical systems to complex target states using machine learning: next-generation vs. classical reservoir computing [68.8204255655161]
Controlling nonlinear dynamical systems using machine learning allows to drive systems into simple behavior like periodicity but also to more complex arbitrary dynamics. We show first that classical reservoir computing excels at this task. In a next step, we compare those results based on different amounts of training data to an alternative setup, where next-generation reservoir computing is used instead. It turns out that while delivering comparable performance for usual amounts of training data, next-generation RC significantly outperforms in situations where only very limited data is available.
arXiv Detail & Related papers (2023-07-14T07:05:17Z)
Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task. The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance. We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z)
Contrastive Value Learning: Implicit Models for Simple Offline RL [40.95632543012637]
We propose Contrastive Value Learning (CVL), which learns an implicit, multi-step model of the environment dynamics. CVL can be learned without access to reward functions, but nonetheless can be used to directly estimate the value of each action. Our experiments demonstrate that CVL outperforms prior offline RL methods on complex continuous control benchmarks.
arXiv Detail & Related papers (2022-11-03T19:10:05Z)
Real-to-Sim: Predicting Residual Errors of Robotic Systems with Sparse Data using a Learning-based Unscented Kalman Filter [65.93205328894608]
We learn the residual errors between a dynamic and/or simulator model and the real robot. We show that with the learned residual errors, we can further close the reality gap between dynamic models, simulations, and actual hardware.
arXiv Detail & Related papers (2022-09-07T15:15:12Z)
Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space. NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z)
Deep Imitation Learning for Bimanual Robotic Manipulation [70.56142804957187]
We present a deep imitation learning framework for robotic bimanual manipulation. A core challenge is to generalize the manipulation skills to objects in different locations. We propose to (i) decompose the multi-modal dynamics into elemental movement primitives, (ii) parameterize each primitive using a recurrent graph neural network to capture interactions, and (iii) integrate a high-level planner that composes primitives sequentially and a low-level controller to combine primitive dynamics and inverse kinematics control.
arXiv Detail & Related papers (2020-10-11T01:40:03Z)
Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget. We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control. We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.