MAN: Multi-Action Networks Learning
- URL: http://arxiv.org/abs/2209.09329v1
- Date: Mon, 19 Sep 2022 20:13:29 GMT
- Title: MAN: Multi-Action Networks Learning
- Authors: Keqin Wang, Alison Bartsch, Amir Barati Farimani
- Abstract summary: We introduce a Deep Reinforcement Learning algorithm call Multi-Action Networks (MAN) Learning.
We propose separating the action space into two components, creating a Value Neural Network for each sub-action.
Then, MAN uses temporal-difference learning to train the networks synchronously, which is simpler than training a single network with a large action output.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning control policies with large action spaces is a challenging problem
in the field of reinforcement learning due to present inefficiencies in
exploration. In this work, we introduce a Deep Reinforcement Learning (DRL)
algorithm call Multi-Action Networks (MAN) Learning that addresses the
challenge of large discrete action spaces. We propose separating the action
space into two components, creating a Value Neural Network for each sub-action.
Then, MAN uses temporal-difference learning to train the networks
synchronously, which is simpler than training a single network with a large
action output directly. To evaluate the proposed method, we test MAN on a block
stacking task, and then extend MAN to handle 12 games from the Atari Arcade
Learning environment with 18 action spaces. Our results indicate that MAN
learns faster than both Deep Q-Learning and Double Deep Q-Learning, implying
our method is a better performing synchronous temporal difference algorithm
than those currently available for large action spaces.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation [58.14969377419633]
We propose spire, a system that decomposes tasks into smaller learning subproblems and second combines imitation and reinforcement learning to maximize their strengths.
We find that spire outperforms prior approaches that integrate imitation learning, reinforcement learning, and planning by 35% to 50% in average task performance.
arXiv Detail & Related papers (2024-10-23T17:42:07Z) - PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control [55.81022882408587]
Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making.
We propose a novel view that treats inducing temporal action abstractions as a sequence compression problem.
We introduce an approach that combines continuous action quantization with byte pair encoding to learn powerful action abstractions.
arXiv Detail & Related papers (2024-02-16T04:55:09Z) - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts.
This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals.
We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z) - DL-DRL: A double-level deep reinforcement learning approach for
large-scale task scheduling of multi-UAV [65.07776277630228]
We propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF)
Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs.
We also exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks.
arXiv Detail & Related papers (2022-08-04T04:35:53Z) - Abstract Demonstrations and Adaptive Exploration for Efficient and
Stable Multi-step Sparse Reward Reinforcement Learning [44.968170318777105]
This paper proposes a DRL exploration technique, termed A2, which integrates two components inspired by human experiences: Abstract demonstrations and Adaptive exploration.
A2 starts by decomposing a complex task into subtasks, and then provides the correct orders of subtasks to learn.
We demonstrate that A2 can aid popular DRL algorithms to learn more efficiently and stably in these environments.
arXiv Detail & Related papers (2022-07-19T12:56:41Z) - Self-Supervised Graph Neural Network for Multi-Source Domain Adaptation [51.21190751266442]
Domain adaptation (DA) tries to tackle the scenarios when the test data does not fully follow the same distribution of the training data.
By learning from large-scale unlabeled samples, self-supervised learning has now become a new trend in deep learning.
We propose a novel textbfSelf-textbfSupervised textbfGraph Neural Network (SSG) to enable more effective inter-task information exchange and knowledge sharing.
arXiv Detail & Related papers (2022-04-08T03:37:56Z) - LASER: Learning a Latent Action Space for Efficient Reinforcement
Learning [41.53297694894669]
We present LASER, a method to learn latent action spaces for efficient reinforcement learning.
We show improved sample efficiency compared to the original action space from better alignment of the action space to the task space, as we observe with visualizations of the learned action space manifold.
arXiv Detail & Related papers (2021-03-29T17:40:02Z) - Training Larger Networks for Deep Reinforcement Learning [18.193180866998333]
We show that naively increasing network capacity does not improve performance.
We propose a novel method that consists of 1) wider networks with DenseNet connection, 2) decoupling representation learning from training of RL, and 3) a distributed training method to mitigate overfitting problems.
Using this three-fold technique, we show that we can train very large networks that result in significant performance gains.
arXiv Detail & Related papers (2021-02-16T02:16:54Z) - Deep Reinforcement Learning with Interactive Feedback in a Human-Robot
Environment [1.2998475032187096]
We propose a deep reinforcement learning approach with interactive feedback to learn a domestic task in a human-robot scenario.
We compare three different learning methods using a simulated robotic arm for the task of organizing different objects.
The obtained results show that a learner agent, using either agent-IDeepRL or human-IDeepRL, completes the given task earlier and has fewer mistakes compared to the autonomous DeepRL approach.
arXiv Detail & Related papers (2020-07-07T11:55:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.