Related papers: Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills

Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills

URL: http://arxiv.org/abs/2010.13766v3
Date: Fri, 11 Feb 2022 01:49:11 GMT
Title: Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills
Authors: Samuele Tosatto, Georgia Chalvatzaki, Jan Peters
Abstract summary: We propose a novel view on handling the demonstrated trajectories for acquiring low-dimensional, non-linear latent dynamics. We introduce a new contextual off-policy RL algorithm, named LAtent-Movements Policy Optimization (LAMPO) LAMPO provides sample-efficient policies against common approaches in literature.
Score: 41.140532647789456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Parameterized movement primitives have been extensively used for imitation learning of robotic tasks. However, the high-dimensionality of the parameter space hinders the improvement of such primitives in the reinforcement learning (RL) setting, especially for learning with physical robots. In this paper we propose a novel view on handling the demonstrated trajectories for acquiring low-dimensional, non-linear latent dynamics, using mixtures of probabilistic principal component analyzers (MPPCA) on the movements' parameter space. Moreover, we introduce a new contextual off-policy RL algorithm, named LAtent-Movements Policy Optimization (LAMPO). LAMPO can provide gradient estimates from previous experience using self-normalized importance sampling, hence, making full use of samples collected in previous learning iterations. These advantages combined provide a complete framework for sample-efficient off-policy optimization of movement primitives for robot learning of high-dimensional manipulation skills. Our experimental results conducted both in simulation and on a real robot show that LAMPO provides sample-efficient policies against common approaches in literature.

Related papers

Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning [99.09906827676748]
We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks. Our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model. In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate optimality between text adherence, motion quality, and human preferences.
arXiv Detail & Related papers (2024-10-09T03:27:14Z)
Incremental Few-Shot Adaptation for Non-Prehensile Object Manipulation using Parallelizable Physics Simulators [5.483662156126757]
We propose a novel approach for non-prehensile manipulation which iteratively adapts a physics-based dynamics model for model-predictive control. We adapt the parameters of the model incrementally with a few examples of robot-object interactions. We evaluate our few-shot adaptation approach in several object pushing experiments in simulation and with a real robot.
arXiv Detail & Related papers (2024-09-20T05:24:25Z)
Navigating the Human Maze: Real-Time Robot Pathfinding with Generative Imitation Learning [0.0]
We introduce goal-conditioned autoregressive models to generate crowd behaviors, capturing intricate interactions among individuals. The model processes potential robot trajectory samples and predicts the reactions of surrounding individuals, enabling proactive robotic navigation in complex scenarios.
arXiv Detail & Related papers (2024-08-07T14:32:41Z)
Machine Learning Optimized Approach for Parameter Selection in MESHFREE Simulations [0.0]
Meshfree simulation methods are emerging as compelling alternatives to conventional mesh-based approaches. We provide a comprehensive overview of our research combining Machine Learning (ML) and Fraunhofer's MESHFREE software. We introduce a novel ML-optimized approach, using active learning, regression trees, and visualization on MESHFREE simulation data.
arXiv Detail & Related papers (2024-03-20T15:29:59Z)
Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning [3.16488279864227]
Reinforcement learning (RL) for motion planning of robots suffers from low efficiency in terms of slow training speed and poor generalizability. We propose a novel RL-based framework that uses implicit behavior cloning (IBC) and dynamic movement primitive (DMP) to improve the training speed and generalizability of an off-policy RL agent.
arXiv Detail & Related papers (2023-07-29T19:46:09Z)
A dynamic Bayesian optimized active recommender system for curiosity-driven Human-in-the-loop automated experiments [8.780395483188242]
We present the development of a new type of human in the loop experimental workflow, via a Bayesian optimized active recommender system (BOARS) This work shows the utility of human-augmented machine learning approaches for curiosity-driven exploration of systems across experimental domains.
arXiv Detail & Related papers (2023-04-05T14:54:34Z)
Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction. Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z)
Gradient-Based Trajectory Optimization With Learned Dynamics [80.41791191022139]
We use machine learning techniques to learn a differentiable dynamics model of the system from data. We show that a neural network can model highly nonlinear behaviors accurately for large time horizons. In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car.
arXiv Detail & Related papers (2022-04-09T22:07:34Z)
Transformer Inertial Poser: Attention-based Real-time Human Motion Reconstruction from Sparse IMUs [79.72586714047199]
We propose an attention-based deep learning method to reconstruct full-body motion from six IMU sensors in real-time. Our method achieves new state-of-the-art results both quantitatively and qualitatively, while being simple to implement and smaller in size.
arXiv Detail & Related papers (2022-03-29T16:24:52Z)
Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform. We produce a closed-loop controller to reactively push objects in a continuous action space. We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
arXiv Detail & Related papers (2021-11-15T18:50:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.