DCT: Dual Channel Training of Action Embeddings for Reinforcement
Learning with Large Discrete Action Spaces
- URL: http://arxiv.org/abs/2306.15913v1
- Date: Wed, 28 Jun 2023 04:32:09 GMT
- Title: DCT: Dual Channel Training of Action Embeddings for Reinforcement
Learning with Large Discrete Action Spaces
- Authors: Pranavi Pathakota and Hardik Meisheri and Harshad Khadilkar
- Abstract summary: We present a novel framework to efficiently learn action embeddings that simultaneously allow us to reconstruct the original action as well as to predict the expected future state.
We use a trained decoder in conjunction with a standard reinforcement learning algorithm that produces actions in the embedding space.
Empirical results show that the model results in cleaner action embeddings, and the improved representations help learn better policies with earlier convergence.
- Score: 4.168157981135697
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The ability to learn robust policies while generalizing over large discrete
action spaces is an open challenge for intelligent systems, especially in noisy
environments that face the curse of dimensionality. In this paper, we present a
novel framework to efficiently learn action embeddings that simultaneously
allow us to reconstruct the original action as well as to predict the expected
future state. We describe an encoder-decoder architecture for action embeddings
with a dual channel loss that balances between action reconstruction and state
prediction accuracy. We use the trained decoder in conjunction with a standard
reinforcement learning algorithm that produces actions in the embedding space.
Our architecture is able to outperform two competitive baselines in two diverse
environments: a 2D maze environment with more than 4000 discrete noisy actions,
and a product recommendation task that uses real-world e-commerce transaction
data. Empirical results show that the model results in cleaner action
embeddings, and the improved representations help learn better policies with
earlier convergence.
Related papers
- From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning [59.88543114325153]
We introduce the Seeing-to-Experiencing framework to scale the capability of navigation foundation models with reinforcement learning.<n>S2E combines the strengths of pre-training on videos and post-training through RL.<n>We establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3DGS reconstructions of real-world scenes.
arXiv Detail & Related papers (2025-07-29T17:26:10Z) - CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards [53.36917093757101]
Role-Playing Language Agents (RPLAs) have emerged as a significant application direction for Large Language Models (LLMs)<n>We introduce textbfCogDual, a novel RPLA adopting a textitcognize-then-respond reasoning paradigm.<n>By jointly modeling external situational awareness and internal self-awareness, CogDual generates responses with improved character consistency and contextual alignment.
arXiv Detail & Related papers (2025-07-23T02:26:33Z) - Hierarchical Graph Information Bottleneck for Multi-Behavior Recommendation [31.495904374599533]
We propose a novel model-agnostic Hierarchical Graph Information Bottleneck (HGIB) framework for multi-behavior recommendation.<n>Our framework optimize the learning of compact yet sufficient representations that preserve essential information for target behavior prediction.<n>We conduct comprehensive experiments on three real-world public datasets, which demonstrate the superior effectiveness of our framework.
arXiv Detail & Related papers (2025-07-21T08:53:49Z) - Generalizable Trajectory Prediction via Inverse Reinforcement Learning with Mamba-Graph Architecture [6.590896800137733]
This paper presents a novel Inverse Reinforcement Learning framework that captures human-like decision-making.<n>The learned reward function is utilized to maximize the likelihood of output by the encoder-decoder architecture.
arXiv Detail & Related papers (2025-06-14T12:18:19Z) - Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy [56.424032454461695]
We present Dita, a scalable framework that leverages Transformer architectures to directly denoise continuous action sequences.
Dita employs in-context conditioning -- enabling fine-grained alignment between denoised actions and raw visual tokens from historical observations.
Dita effectively integrates cross-embodiment datasets across diverse camera perspectives, observation scenes, tasks, and action spaces.
arXiv Detail & Related papers (2025-03-25T15:19:56Z) - LEGO-Motion: Learning-Enhanced Grids with Occupancy Instance Modeling for Class-Agnostic Motion Prediction [12.071846486955627]
We introduce a novel occupancy-instance modeling framework for class-agnostic motion prediction tasks, named LEGO-Motion.
Our model comprises (1) a BEV encoder, (2) an Interaction-Augmented Instance, and (3) an Instance-Enhanced BEV.
Our method achieves state-of-the-art performance, outperforming existing approaches.
arXiv Detail & Related papers (2025-03-10T14:26:21Z) - ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning [90.41852663775086]
ACT-JEPA is a novel architecture that integrates imitation learning and self-supervised learning.
We train a policy to predict action sequences and abstract observation sequences.
Our experiments show that ACT-JEPA improves the quality of representations by learning temporal environment dynamics.
arXiv Detail & Related papers (2025-01-24T16:41:41Z) - Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Synesthesia of Machines (SoM)-Enhanced ISAC Precoding for Vehicular Networks with Double Dynamics [15.847713094328286]
Integrated sensing and communication (ISAC) technology plays a crucial role in vehicular networks.
Double dynamics present significant challenges for real-time ISAC precoding design.
We propose a synesthesia of machine (SoM)-enhanced precoding paradigm.
arXiv Detail & Related papers (2024-08-24T10:35:10Z) - Bidirectional Trained Tree-Structured Decoder for Handwritten
Mathematical Expression Recognition [51.66383337087724]
The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR.
Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models.
We propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure.
arXiv Detail & Related papers (2023-12-31T09:24:21Z) - Let Offline RL Flow: Training Conservative Agents in the Latent Space of
Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions.
We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model.
We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z) - EvDistill: Asynchronous Events to End-task Learning via Bidirectional
Reconstruction-guided Cross-modal Knowledge Distillation [61.33010904301476]
Event cameras sense per-pixel intensity changes and produce asynchronous event streams with high dynamic range and less motion blur.
We propose a novel approach, called bfEvDistill, to learn a student network on the unlabeled and unpaired event data.
We show that EvDistill achieves significantly better results than the prior works and KD with only events and APS frames.
arXiv Detail & Related papers (2021-11-24T08:48:16Z) - Elaborative Rehearsal for Zero-shot Action Recognition [36.84404523161848]
ZSAR aims to recognize target (unseen) actions without training examples.
It remains challenging to semantically represent action classes and transfer knowledge from seen data.
We propose an ER-enhanced ZSAR model inspired by an effective human memory technique Elaborative Rehearsal.
arXiv Detail & Related papers (2021-08-05T20:02:46Z) - Cross-modal Consensus Network for Weakly Supervised Temporal Action
Localization [74.34699679568818]
Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to localize action instances in the given video with video-level categorical supervision.
We propose a cross-modal consensus network (CO2-Net) to tackle this problem.
arXiv Detail & Related papers (2021-07-27T04:21:01Z) - Learning Routines for Effective Off-Policy Reinforcement Learning [0.0]
We propose a novel framework for reinforcement learning that effectively lifts such constraints.
Within our framework, agents learn effective behavior over a routine space.
We show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode.
arXiv Detail & Related papers (2021-06-05T18:41:57Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.