DCT: Dual Channel Training of Action Embeddings for Reinforcement
Learning with Large Discrete Action Spaces
- URL: http://arxiv.org/abs/2306.15913v1
- Date: Wed, 28 Jun 2023 04:32:09 GMT
- Title: DCT: Dual Channel Training of Action Embeddings for Reinforcement
Learning with Large Discrete Action Spaces
- Authors: Pranavi Pathakota and Hardik Meisheri and Harshad Khadilkar
- Abstract summary: We present a novel framework to efficiently learn action embeddings that simultaneously allow us to reconstruct the original action as well as to predict the expected future state.
We use a trained decoder in conjunction with a standard reinforcement learning algorithm that produces actions in the embedding space.
Empirical results show that the model results in cleaner action embeddings, and the improved representations help learn better policies with earlier convergence.
- Score: 4.168157981135697
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The ability to learn robust policies while generalizing over large discrete
action spaces is an open challenge for intelligent systems, especially in noisy
environments that face the curse of dimensionality. In this paper, we present a
novel framework to efficiently learn action embeddings that simultaneously
allow us to reconstruct the original action as well as to predict the expected
future state. We describe an encoder-decoder architecture for action embeddings
with a dual channel loss that balances between action reconstruction and state
prediction accuracy. We use the trained decoder in conjunction with a standard
reinforcement learning algorithm that produces actions in the embedding space.
Our architecture is able to outperform two competitive baselines in two diverse
environments: a 2D maze environment with more than 4000 discrete noisy actions,
and a product recommendation task that uses real-world e-commerce transaction
data. Empirical results show that the model results in cleaner action
embeddings, and the improved representations help learn better policies with
earlier convergence.
Related papers
- Bidirectional Trained Tree-Structured Decoder for Handwritten
Mathematical Expression Recognition [51.66383337087724]
The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR.
Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models.
We propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure.
arXiv Detail & Related papers (2023-12-31T09:24:21Z) - Let Offline RL Flow: Training Conservative Agents in the Latent Space of
Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions.
We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model.
We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z) - EvDistill: Asynchronous Events to End-task Learning via Bidirectional
Reconstruction-guided Cross-modal Knowledge Distillation [61.33010904301476]
Event cameras sense per-pixel intensity changes and produce asynchronous event streams with high dynamic range and less motion blur.
We propose a novel approach, called bfEvDistill, to learn a student network on the unlabeled and unpaired event data.
We show that EvDistill achieves significantly better results than the prior works and KD with only events and APS frames.
arXiv Detail & Related papers (2021-11-24T08:48:16Z) - Learning to Centralize Dual-Arm Assembly [0.6091702876917281]
This work focuses on assembly with humanoid robots by providing a framework for dual-arm peg-in-hole manipulation.
We reduce modeling effort to a minimum by using sparse rewards only.
We demonstrate the effectiveness of the framework on dual-arm peg-in-hole and analyze sample efficiency and success rates for different action spaces.
arXiv Detail & Related papers (2021-10-08T09:59:12Z) - Elaborative Rehearsal for Zero-shot Action Recognition [36.84404523161848]
ZSAR aims to recognize target (unseen) actions without training examples.
It remains challenging to semantically represent action classes and transfer knowledge from seen data.
We propose an ER-enhanced ZSAR model inspired by an effective human memory technique Elaborative Rehearsal.
arXiv Detail & Related papers (2021-08-05T20:02:46Z) - Cross-modal Consensus Network for Weakly Supervised Temporal Action
Localization [74.34699679568818]
Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to localize action instances in the given video with video-level categorical supervision.
We propose a cross-modal consensus network (CO2-Net) to tackle this problem.
arXiv Detail & Related papers (2021-07-27T04:21:01Z) - Learning Routines for Effective Off-Policy Reinforcement Learning [0.0]
We propose a novel framework for reinforcement learning that effectively lifts such constraints.
Within our framework, agents learn effective behavior over a routine space.
We show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode.
arXiv Detail & Related papers (2021-06-05T18:41:57Z) - Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.
We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function.
We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.