Value Flows
- URL: http://arxiv.org/abs/2510.07650v1
- Date: Thu, 09 Oct 2025 00:57:40 GMT
- Title: Value Flows
- Authors: Perry Dong, Chongyi Zheng, Chelsea Finn, Dorsa Sadigh, Benjamin Eysenbach,
- Abstract summary: This paper uses modern, flexible flow-based models to estimate the full future return distributions.<n>Building upon the learned flow models, we estimate the return uncertainty of distinct states using a new flow derivative ODE.<n>Experiments on $37$ state-based and $25$ image-based benchmark tasks demonstrate that Value Flows achieves a $1.3times$ improvement on average in success rates.
- Score: 90.1510269525399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While most reinforcement learning methods today flatten the distribution of future returns to a single scalar value, distributional RL methods exploit the return distribution to provide stronger learning signals and to enable applications in exploration and safe RL. While the predominant method for estimating the return distribution is by modeling it as a categorical distribution over discrete bins or estimating a finite number of quantiles, such approaches leave unanswered questions about the fine-grained structure of the return distribution and about how to distinguish states with high return uncertainty for decision-making. The key idea in this paper is to use modern, flexible flow-based models to estimate the full future return distributions and identify those states with high return variance. We do so by formulating a new flow-matching objective that generates probability density paths satisfying the distributional Bellman equation. Building upon the learned flow models, we estimate the return uncertainty of distinct states using a new flow derivative ODE. We additionally use this uncertainty information to prioritize learning a more accurate return estimation on certain transitions. We compare our method (Value Flows) with prior methods in the offline and online-to-online settings. Experiments on $37$ state-based and $25$ image-based benchmark tasks demonstrate that Value Flows achieves a $1.3\times$ improvement on average in success rates. Website: https://pd-perry.github.io/value-flows Code: https://github.com/chongyi-zheng/value-flows
Related papers
- FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning [8.193127364294034]
Existing works improve the reliability of value function estimation via multi-critic ensembles and distributional RL.<n>Inspired by flow matching's success in generative modeling, we propose a generative paradigm for value estimation, named FlowCritic.
arXiv Detail & Related papers (2025-10-26T14:12:32Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.<n>We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.<n>Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Distributional Reinforcement Learning with Dual Expectile-Quantile Regression [51.87411935256015]
quantile regression approach to distributional RL provides flexible and effective way of learning arbitrary return distributions.<n>We show that distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean.<n>Motivated by the efficiency of $L$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning.
arXiv Detail & Related papers (2023-05-26T12:30:05Z) - Towards Understanding and Improving GFlowNet Training [71.85707593318297]
We introduce an efficient evaluation strategy to compare the learned sampling distribution to the target reward distribution.
We propose prioritized replay training of high-reward $x$, relative edge flow policy parametrization, and a novel guided trajectory balance objective.
arXiv Detail & Related papers (2023-05-11T22:50:41Z) - Normality-Guided Distributional Reinforcement Learning for Continuous Control [13.818149654692863]
Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms.<n>We study the value distribution in several continuous control tasks and find that the learned value distribution is empirically quite close to normal.<n>We propose a policy update strategy based on the correctness as measured by structural characteristics of the value distribution not present in the standard value function.
arXiv Detail & Related papers (2022-08-28T02:52:10Z) - Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms.
These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation.
We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z) - Continual Density Ratio Estimation in an Online Setting [12.516472146904375]
In online applications with streaming data, awareness of how far the training or test set has shifted away from the original dataset can be crucial to the performance of the model.
We propose a novel method, Continual Density Ratio Estimation (CDRE), for estimating density ratios between the initial and current distributions.
We demonstrate that CDRE can be more accurate than standard DRE in terms of estimating divergences between distributions, despite not requiring samples from the original distribution.
arXiv Detail & Related papers (2021-03-09T07:56:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.