Exploiting Symmetry and Heuristic Demonstrations in Off-policy
Reinforcement Learning for Robotic Manipulation
- URL: http://arxiv.org/abs/2304.06055v1
- Date: Wed, 12 Apr 2023 11:38:01 GMT
- Title: Exploiting Symmetry and Heuristic Demonstrations in Off-policy
Reinforcement Learning for Robotic Manipulation
- Authors: Amir M. Soufi Enayati, Zengjie Zhang, Kashish Gupta, and Homayoun
Najjaran
- Abstract summary: This paper aims to define and incorporate the natural symmetry present in physical robotic environments.
The proposed method is validated via two point-to-point reaching tasks of an industrial arm, with and without an obstacle.
A comparison study between the proposed method and a traditional off-policy reinforcement learning algorithm indicates its advantage in learning performance and potential value for applications.
- Score: 1.7901837062462316
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reinforcement learning demonstrates significant potential in automatically
building control policies in numerous domains, but shows low efficiency when
applied to robot manipulation tasks due to the curse of dimensionality. To
facilitate the learning of such tasks, prior knowledge or heuristics that
incorporate inherent simplification can effectively improve the learning
performance. This paper aims to define and incorporate the natural symmetry
present in physical robotic environments. Then, sample-efficient policies are
trained by exploiting the expert demonstrations in symmetrical environments
through an amalgamation of reinforcement and behavior cloning, which gives the
off-policy learning process a diverse yet compact initiation. Furthermore, it
presents a rigorous framework for a recent concept and explores its scope for
robot manipulation tasks. The proposed method is validated via two
point-to-point reaching tasks of an industrial arm, with and without an
obstacle, in a simulation experiment study. A PID controller, which tracks the
linear joint-space trajectories with hard-coded temporal logic to produce
interim midpoints, is used to generate demonstrations in the study. The results
of the study present the effect of the number of demonstrations and quantify
the magnitude of behavior cloning to exemplify the possible improvement of
model-free reinforcement learning in common manipulation tasks. A comparison
study between the proposed method and a traditional off-policy reinforcement
learning algorithm indicates its advantage in learning performance and
potential value for applications.
Related papers
- Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - Unsupervised Learning of Effective Actions in Robotics [0.9374652839580183]
Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions.
We propose an unsupervised algorithm to discretize a continuous motion space and generate "action prototypes"
We evaluate our method on a simulated stair-climbing reinforcement learning task.
arXiv Detail & Related papers (2024-04-03T13:28:52Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Tactile Active Inference Reinforcement Learning for Efficient Robotic
Manipulation Skill Acquisition [10.072992621244042]
We propose a novel method for skill learning in robotic manipulation called Tactile Active Inference Reinforcement Learning (Tactile-AIRL)
To enhance the performance of reinforcement learning (RL), we introduce active inference, which integrates model-based techniques and intrinsic curiosity into the RL process.
We demonstrate that our method achieves significantly high training efficiency in non-prehensile objects pushing tasks.
arXiv Detail & Related papers (2023-11-19T10:19:22Z) - Understanding Physical Effects for Effective Tool-use [91.55810923916454]
We present a robot learning and planning framework that produces an effective tool-use strategy with the least joint efforts.
We use a Finite Element Method (FEM)-based simulator that reproduces fine-grained, continuous visual and physical effects given observed tool-use events.
In simulation, we demonstrate that the proposed framework can produce more effective tool-use strategies, drastically different from the observed ones in two tasks.
arXiv Detail & Related papers (2022-06-30T03:13:38Z) - ReIL: A Framework for Reinforced Intervention-based Imitation Learning [3.0846824529023387]
We introduce Reinforced Intervention-based Learning (ReIL), a framework consisting of a general intervention-based learning algorithm and a multi-task imitation learning model.
Experimental results from real world mobile robot navigation challenges indicate that ReIL learns rapidly from sparse supervisor corrections without suffering deterioration in performance.
arXiv Detail & Related papers (2022-03-29T09:30:26Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Motion Generation Using Bilateral Control-Based Imitation Learning with
Autoregressive Learning [3.4410212782758047]
We propose a method of autoregressive learning for bilateral control-based imitation learning.
A new neural network model for implementing autoregressive learning is proposed.
arXiv Detail & Related papers (2020-11-12T04:35:48Z) - Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep
Reinforcement Learning [0.06554326244334865]
We analyze how multi-agent reinforcement learning can bridge the gap to reality in distributed multi-robot systems.
We introduce the effect of sensing, calibration, and accuracy mismatches in distributed reinforcement learning.
We discuss on how both the different types of perturbances and how the number of agents experiencing those perturbances affect the collaborative learning effort.
arXiv Detail & Related papers (2020-08-18T11:57:33Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.