Related papers: DATD3: Depthwise Attention Twin Delayed Deep Deterministic Policy Gradient For Model Free Reinforcement Learning Under Output Feedback Control

DATD3: Depthwise Attention Twin Delayed Deep Deterministic Policy Gradient For Model Free Reinforcement Learning Under Output Feedback Control

URL: http://arxiv.org/abs/2505.23857v1
Date: Thu, 29 May 2025 06:22:06 GMT
Title: DATD3: Depthwise Attention Twin Delayed Deep Deterministic Policy Gradient For Model Free Reinforcement Learning Under Output Feedback Control
Authors: Wuhao Wang, Zhiyong Chen,
Abstract summary: Reinforcement learning in real-world applications often involves output-feedback settings, where the agent receives only partial state information.<n>We propose the Output-Feedback Markov Decision Process (OPMDP), which extends the standard MDP formulation to accommodate decision-making based on observation histories.<n>We introduce Depthwise Attention Twin Delayed Deep Deterministic Policy Gradient ( DATD3), a novel actor-critic algorithm that employs depthwise separable convolution and multi-head attention to encode historical observations.<n>Experiments on continuous control tasks demonstrate that DATD3 outperforms existing memory-based and recurrent baselines under both partial and full
Score: 4.473337652382325
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning in real-world applications often involves output-feedback settings, where the agent receives only partial state information. To address this challenge, we propose the Output-Feedback Markov Decision Process (OPMDP), which extends the standard MDP formulation to accommodate decision-making based on observation histories. Building on this framework, we introduce Depthwise Attention Twin Delayed Deep Deterministic Policy Gradient (DATD3), a novel actor-critic algorithm that employs depthwise separable convolution and multi-head attention to encode historical observations. DATD3 maintains policy expressiveness while avoiding the instability of recurrent models. Extensive experiments on continuous control tasks demonstrate that DATD3 outperforms existing memory-based and recurrent baselines under both partial and full observability.

Related papers

Evaluating Robustness of Monocular Depth Estimation with Procedural Scene Perturbations [55.4735586739093]
We introduce PDE, a new benchmark which enables systematic robustness evaluation.<n>PDE uses procedural generation to create 3D scenes that test robustness to various controlled perturbations.<n>Our analysis yields interesting findings on what perturbations are challenging for state-of-the-art depth models.
arXiv Detail & Related papers (2025-07-01T17:33:48Z)
ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision. This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline. Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z)
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL [57.202733701029594]
We propose Decision Mamba, a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy.<n>To address these challenges, we propose Decision Mamba, a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy.<n>To mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization.
arXiv Detail & Related papers (2024-06-08T10:12:00Z)
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery [71.6345505427213]
DPMesh is an innovative framework for occluded human mesh recovery. It capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-04-01T18:59:13Z)
IPED: An Implicit Perspective for Relational Triple Extraction based on Diffusion Model [7.894136732348917]
Implicit Perspective for triple Extraction based on Diffusion model (IPED) We propose an Implicit Perspective for triple Extraction based on Diffusion model (IPED) Our solution adopts an implicit using block coverage to complete the tables, avoiding the limitations of explicit tagging methods.
arXiv Detail & Related papers (2024-02-24T14:18:11Z)
Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks [5.968716050740402]
This paper focuses on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks. We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent. Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity.
arXiv Detail & Related papers (2024-02-14T10:44:03Z)
Gleo-Det: Deep Convolution Feature-Guided Detector with Local Entropy Optimization for Salient Points [5.955667705173262]
We propose to achieve fine constraint based on the requirement of repeatability while coarse constraint with guidance of deep convolution features. With the guidance of convolution features, we define the cost function from both positive and negative sides.
arXiv Detail & Related papers (2022-04-27T12:40:21Z)
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations. We derive suitable measures to quantify prediction uncertainty at both pose and joint level. We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)
Depth-Cooperated Trimodal Network for Video Salient Object Detection [13.727763221832532]
We propose a depth-operated triOD network called DCTNet for video salient object detection (VS) To this end, we first generate depth from RGB frames, and then propose an approach to treat the three modalities unequally. We also introduce a refinement fusion module (RFM) to suppress noises in each modality and select useful information dynamically for further feature refinement.
arXiv Detail & Related papers (2022-02-12T13:04:16Z)
Unsupervised Visual Attention and Invariance for Reinforcement Learning [25.673868326662024]
We develop an independent module to disperse interference factors irrelevant to the task, thereby providing "clean" observations for the vision-based reinforcement learning policy. All components are optimized in an unsupervised way, without manual annotation or access to environment internals. VAI empirically shows powerful generalization capabilities and significantly outperforms current state-of-the-art (SOTA) method by 15% to 49% in DeepMind Control suite benchmark.
arXiv Detail & Related papers (2021-04-07T05:28:01Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.