Directly Forecasting Belief for Reinforcement Learning with Delays
- URL: http://arxiv.org/abs/2505.00546v1
- Date: Thu, 01 May 2025 14:20:48 GMT
- Title: Directly Forecasting Belief for Reinforcement Learning with Delays
- Authors: Qingyuan Wu, Yuhui Wang, Simon Sinong Zhan, Yixuan Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang,
- Abstract summary: Reinforcement learning with delays is challenging as sensory perceptions lag behind the actual events.<n>We develop a novel belief estimation method named Directly Forecasting Belief Transformer (DFBT)<n>DFBT directly forecasts states from observations without incrementally estimating intermediate states step-by-step.<n>In experiments with D4RL offline datasets, DFBT reduces compounding errors with remarkable prediction accuracy.
- Score: 41.52768902667611
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) with delays is challenging as sensory perceptions lag behind the actual events: the RL agent needs to estimate the real state of its environment based on past observations. State-of-the-art (SOTA) methods typically employ recursive, step-by-step forecasting of states. This can cause the accumulation of compounding errors. To tackle this problem, our novel belief estimation method, named Directly Forecasting Belief Transformer (DFBT), directly forecasts states from observations without incrementally estimating intermediate states step-by-step. We theoretically demonstrate that DFBT greatly reduces compounding errors of existing recursively forecasting methods, yielding stronger performance guarantees. In experiments with D4RL offline datasets, DFBT reduces compounding errors with remarkable prediction accuracy. DFBT's capability to forecast state sequences also facilitates multi-step bootstrapping, thus greatly improving learning efficiency. On the MuJoCo benchmark, our DFBT-based method substantially outperforms SOTA baselines.
Related papers
- Fast Training of Recurrent Neural Networks with Stationary State Feedbacks [48.22082789438538]
Recurrent neural networks (RNNs) have recently demonstrated strong performance and faster inference than Transformers.
We propose a novel method that replaces BPTT with a fixed gradient feedback mechanism.
arXiv Detail & Related papers (2025-03-29T14:45:52Z) - The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency.<n>UPFT removes the need for labeled data or exhaustive sampling.<n> Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z) - Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning [53.25336975467293]
We present the first theoretical error decomposition analysis of methods such as perplexity and self-consistency.<n>Our analysis reveals a fundamental trade-off: perplexity methods suffer from substantial model error due to the absence of a proper consistency function.<n>We propose Reasoning-Pruning Perplexity Consistency (RPC), which integrates perplexity with self-consistency, and Reasoning Pruning, which eliminates low-probability reasoning paths.
arXiv Detail & Related papers (2025-02-01T18:09:49Z) - Efficient Diffusion as Low Light Enhancer [63.789138528062225]
Reflectance-Aware Trajectory Refinement (RATR) is a simple yet effective module to refine the teacher trajectory using the reflectance component of images.
textbfReflectance-aware textbfDiffusion with textbfDistilled textbfTrajectory (textbfReDDiT) is an efficient and flexible distillation framework tailored for Low-Light Image Enhancement (LLIE)
arXiv Detail & Related papers (2024-10-16T08:07:18Z) - LTAU-FF: Loss Trajectory Analysis for Uncertainty in Atomistic Force Fields [5.396675151318325]
Model ensembles are effective tools for estimating prediction uncertainty in deep learning atomistic force fields.
However, their widespread adoption is hindered by high computational costs and overconfident error estimates.
We address these challenges by leveraging distributions of per-sample errors obtained during training and employing a distance-based similarity search in the model latent space.
Our method, which we call LTAU, efficiently estimates the full probability distribution function (PDF) of errors for any test point using the logged training errors.
arXiv Detail & Related papers (2024-02-01T18:50:42Z) - Solving Offline Reinforcement Learning with Decision Tree Regression [0.0]
This study presents a novel approach to addressing offline reinforcement learning problems by reframing them as regression tasks.
We introduce two distinct frameworks: return-conditioned and return-weighted decision tree policies.
Despite the simplification inherent in this reformulated approach to offline RL, our agents demonstrate performance that is at least on par with the established methods.
arXiv Detail & Related papers (2024-01-21T23:50:46Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - Exploring the Training Robustness of Distributional Reinforcement
Learning against Noisy State Observations [7.776010676090131]
State observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training.
In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return.
arXiv Detail & Related papers (2021-09-17T22:37:39Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Regularized Behavior Value Estimation [31.332929202377]
We introduce Regularized Behavior Value Estimation (R-BVE)
R-BVE estimates the value of the behavior policy during training and only performs policy improvement at deployment time.
We provide ample empirical evidence of R-BVE's effectiveness, including state-of-the-art performance on the RL Unplugged ATARI dataset.
arXiv Detail & Related papers (2021-03-17T11:34:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.