Disentangled Planning and Control in Vision Based Robotics via Reward
Machines
- URL: http://arxiv.org/abs/2012.14464v1
- Date: Mon, 28 Dec 2020 19:54:40 GMT
- Title: Disentangled Planning and Control in Vision Based Robotics via Reward
Machines
- Authors: Alberto Camacho, Jacob Varley, Deepali Jain, Atil Iscen and Dmitry
Kalashnikov
- Abstract summary: We augment a Deep Q-Learning agent with a Reward Machine (DQRM) to increase speed of learning vision-based policies for robot tasks.
A reward machine (RM) is a finite state machine that decomposes a task into a discrete planning graph and equips the agent with a reward function to guide it toward task completion.
- Score: 13.486750561133634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we augment a Deep Q-Learning agent with a Reward Machine (DQRM)
to increase speed of learning vision-based policies for robot tasks, and
overcome some of the limitations of DQN that prevent it from converging to
good-quality policies. A reward machine (RM) is a finite state machine that
decomposes a task into a discrete planning graph and equips the agent with a
reward function to guide it toward task completion. The reward machine can be
used for both reward shaping, and informing the policy what abstract state it
is currently at. An abstract state is a high level simplification of the
current state, defined in terms of task relevant features. These two
supervisory signals of reward shaping and knowledge of current abstract state
coming from the reward machine complement each other and can both be used to
improve policy performance as demonstrated on several vision based robotic pick
and place tasks. Particularly for vision based robotics applications, it is
often easier to build a reward machine than to try and get a policy to learn
the task without this structure.
Related papers
- RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation [52.14638923430338]
We propose conditioning policies on affordances, which capture the pose of the robot at key stages of the task.
Our method, RT-Affordance, is a hierarchical model that first proposes an affordance plan given the task language.
We show on a diverse set of novel tasks how RT-Affordance exceeds the performance of existing methods by over 50%.
arXiv Detail & Related papers (2024-11-05T01:02:51Z) - Maximally Permissive Reward Machines [8.425937972214667]
We propose a new approach to synthesising reward machines based on the set of partial order plans for a goal.
We prove that learning using such "maximally permissive" reward machines results in higher rewards than learning using RMs based on a single plan.
arXiv Detail & Related papers (2024-08-15T09:59:26Z) - Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL.
On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z) - Robotic Control via Embodied Chain-of-Thought Reasoning [86.6680905262442]
Key limitation of learned robot control policies is their inability to generalize outside their training data.
Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models can substantially improve their robustness and generalization ability.
We introduce Embodied Chain-of-Thought Reasoning (ECoT) for VLAs, in which we train VLAs to perform multiple steps of reasoning about plans, sub-tasks, motions, and visually grounded features before predicting the robot action.
arXiv Detail & Related papers (2024-07-11T17:31:01Z) - Goal-Conditioned Reinforcement Learning with Disentanglement-based
Reachability Planning [14.370384505230597]
We propose a goal-conditioned RL algorithm combined with Disentanglement-based Reachability Planning (REPlan) to solve temporally extended tasks.
Our REPlan significantly outperforms the prior state-of-the-art methods in solving temporally extended tasks.
arXiv Detail & Related papers (2023-07-20T13:08:14Z) - Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from
Offline Data [101.43350024175157]
Self-supervised learning has the potential to decrease the amount of human annotation and engineering effort required to learn control strategies.
Our work builds on prior work showing that the reinforcement learning (RL) itself can be cast as a self-supervised problem.
We demonstrate that a self-supervised RL algorithm based on contrastive learning can solve real-world, image-based robotic manipulation tasks.
arXiv Detail & Related papers (2023-06-06T01:36:56Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Leveraging Sequentiality in Reinforcement Learning from a Single
Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration.
We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z) - Model-Free Reinforcement Learning for Symbolic Automata-encoded
Objectives [0.0]
Reinforcement learning (RL) is a popular approach for robotic path planning in uncertain environments.
Poorly designed rewards can lead to policies that do get maximal rewards but fail to satisfy desired task objectives or are unsafe.
We propose using formal specifications in the form of symbolic automata.
arXiv Detail & Related papers (2022-02-04T21:54:36Z) - Reward Machines: Exploiting Reward Function Structure in Reinforcement
Learning [22.242379207077217]
We show how to show the reward function's code to the RL agent so it can exploit the function's internal structure to learn optimal policies.
First, we propose reward machines, a type of finite state machine that supports the specification of reward functions.
We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning.
arXiv Detail & Related papers (2020-10-06T00:10:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.