Related papers: Memory-based Deep Reinforcement Learning for POMDP

Related papers

Synthetic POMDPs to Challenge Memory-Augmented RL: Memory Demand Structure Modeling [6.279650855031215]
Recent research has developed benchmarks for memory-augmented reinforcement learning (RL) algorithms.<n>POMDP environments where agents depend on past observations to make decisions.<n>Our work clarifies the challenges of memory-augmented RL in solving POMDPs, provides guidelines for analyzing and designing POMDP environments, and offers empirical support for selecting memory models in RL tasks.
arXiv Detail & Related papers (2025-08-06T10:13:17Z)
Dual-Priv Pruning : Efficient Differential Private Fine-Tuning in Multimodal Large Language Models [21.598534853947676]
We propose a framework that employs two complementary pruning mechanisms for Differential Privacy (DP) fine-tuning in MLLMs.<n>Our approach consistently utilizes less memory than standard DP-SGD.<n>To the best of our knowledge, we are the first to explore DP fine-tuning in MLLMs.
arXiv Detail & Related papers (2025-06-08T10:33:01Z)
DATD3: Depthwise Attention Twin Delayed Deep Deterministic Policy Gradient For Model Free Reinforcement Learning Under Output Feedback Control [4.473337652382325]
Reinforcement learning in real-world applications often involves output-feedback settings, where the agent receives only partial state information.<n>We propose the Output-Feedback Markov Decision Process (OPMDP), which extends the standard MDP formulation to accommodate decision-making based on observation histories.<n>We introduce Depthwise Attention Twin Delayed Deep Deterministic Policy Gradient ( DATD3), a novel actor-critic algorithm that employs depthwise separable convolution and multi-head attention to encode historical observations.<n>Experiments on continuous control tasks demonstrate that DATD3 outperforms existing memory-based and recurrent baselines under both partial and full
arXiv Detail & Related papers (2025-05-29T06:22:06Z)
Value-Based Deep RL Scales Predictably [100.21834069400023]
We show that value-based off-policy RL methods are predictable despite community lore regarding their pathological behavior. We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym.
arXiv Detail & Related papers (2025-02-06T18:59:47Z)
Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs) Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally. Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z)
Depth-discriminative Metric Learning for Monocular 3D Object Detection [14.554132525651868]
We introduce a novel metric learning scheme that encourages the model to extract depth-discriminative features regardless of the visual attributes. Our method consistently improves the performance of various baselines by 23.51% and 5.78% on average.
arXiv Detail & Related papers (2024-01-02T07:34:09Z)
R^3: On-device Real-Time Deep Reinforcement Learning for Autonomous Robotics [9.2327813168753]
This paper presents R3, a holistic solution for managing timing, memory, and algorithm performance in on-device real-time DRL training. R3 employs (i) a deadline-driven feedback loop with dynamic batch sizing for optimizing timing, (ii) efficient memory management to reduce memory footprint and allow larger replay buffer sizes, and (iii) a runtime coordinator guided by runtime analysis and a runtime profiler for adjusting memory resource reservations.
arXiv Detail & Related papers (2023-08-29T05:48:28Z)
Provably Efficient Algorithm for Nonstationary Low-Rank MDPs [48.92657638730582]
We make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time. We propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL. For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with sample complexity.
arXiv Detail & Related papers (2023-08-10T09:52:44Z)
Experimental Study on The Effect of Multi-step Deep Reinforcement Learning in POMDPs [3.7186122930334724]
This paper considers three popular DRL algorithms, namely Proximal Policy Optimization (PPO), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Soft Actor-Critic (SAC) We show that SAC and TD3 typically outperform PPO across a broad range of tasks that can be represented as MDPs. We identify this by observing that the inclusion of multi-step bootstrapping in TD3 and SAC results in improved robustness in POMDP settings.
arXiv Detail & Related papers (2022-09-12T03:12:04Z)
Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z)
Deep Deterministic Uncertainty for Semantic Segmentation [97.89295891304394]
We extend Deep Deterministic Uncertainty (DDU) to semantic segmentation. We show that DDU improves upon MC Dropout and Deep Ensembles while being significantly faster to compute.
arXiv Detail & Related papers (2021-10-29T20:45:58Z)
EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)
End-to-End Egospheric Spatial Memory [32.42361470456194]
We propose a parameter-free module, Egospheric Spatial Memory (ESM), which encodes the memory in an ego-sphere around the agent. ESM can be trained end-to-end via either imitation or reinforcement learning. We show applications to semantic segmentation on the ScanNet dataset, where ESM naturally combines image-level and map-level inference modalities.
arXiv Detail & Related papers (2021-02-15T18:59:07Z)
DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs [47.73837217824527]
We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience. Our main contribution is to introduce the Deep Averagers with Costs MDP (DAC-MDP) and to investigate its solutions for offline RL.
arXiv Detail & Related papers (2020-10-18T00:11:45Z)
Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs) We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.