Temporal Difference Flows
- URL: http://arxiv.org/abs/2503.09817v1
- Date: Wed, 12 Mar 2025 20:30:07 GMT
- Title: Temporal Difference Flows
- Authors: Jesse Farebrother, Matteo Pirotta, Andrea Tirinzoni, Rémi Munos, Alessandro Lazaric, Ahmed Touati,
- Abstract summary: Geometric Horizon Models (GHMs) offer a compelling alternative by directly making predictions of future states.<n>Existing methods are negatively affected by bootstrapping predictions at train time and struggle to generate high-quality predictions at long horizons.<n>This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods.
- Score: 82.24174052059352
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predictive models of the future are fundamental for an agent's ability to reason and plan. A common strategy learns a world model and unrolls it step-by-step at inference, where small errors can rapidly compound. Geometric Horizon Models (GHMs) offer a compelling alternative by directly making predictions of future states, avoiding cumulative inference errors. While GHMs can be conveniently learned by a generative analog to temporal difference (TD) learning, existing methods are negatively affected by bootstrapping predictions at train time and struggle to generate high-quality predictions at long horizons. This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods. Theoretically, we establish a new convergence result and primarily attribute TD-Flow's efficacy to reduced gradient variance during training. We further show that similar arguments can be extended to diffusion-based methods. Empirically, we validate TD-Flow across a diverse set of domains on both generative metrics and downstream tasks including policy evaluation. Moreover, integrating TD-Flow with recent behavior foundation models for planning over pre-trained policies demonstrates substantial performance gains, underscoring its promise for long-horizon decision-making.
Related papers
- Amortizing intractable inference in diffusion models for vision, language, and control [89.65631572949702]
This paper studies amortized sampling of the posterior over data, $mathbfxsim prm post(mathbfx)propto p(mathbfx)r(mathbfx)$, in a model that consists of a diffusion generative model prior $p(mathbfx)$ and a black-box constraint or function $r(mathbfx)$.<n>We prove the correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from
arXiv Detail & Related papers (2024-05-31T16:18:46Z) - Bayesian Deep Learning for Remaining Useful Life Estimation via Stein
Variational Gradient Descent [14.784809634505903]
We show that Bayesian deep learning models trained via Stein variational gradient descent consistently outperform with respect to convergence speed and predictive performance.
We propose a method to enhance performance based on the uncertainty information provided by the Bayesian models.
arXiv Detail & Related papers (2024-02-02T02:21:06Z) - Unsupervised Discovery of Interpretable Directions in h-space of
Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself.
By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z) - Improving Trajectory Prediction in Dynamic Multi-Agent Environment by
Dropping Waypoints [9.385936248154987]
Motion prediction systems must learn spatial and temporal information from the past to forecast the future trajectories of the agent.
We propose Temporal Waypoint Dropping (TWD) that explicitly incorporates temporal dependencies during the training of a trajectory prediction model.
We evaluate our proposed approach on three datasets: NBA Sports VU, ETH-UCY, and TrajNet++.
arXiv Detail & Related papers (2023-09-29T15:48:35Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Reconstructing Graph Diffusion History from a Single Snapshot [87.20550495678907]
We propose a novel barycenter formulation for reconstructing Diffusion history from A single SnapsHot (DASH)
We prove that estimation error of diffusion parameters is unavoidable due to NP-hardness of diffusion parameter estimation.
We also develop an effective solver named DIffusion hiTting Times with Optimal proposal (DITTO)
arXiv Detail & Related papers (2023-06-01T09:39:32Z) - On the Statistical Benefits of Temporal Difference Learning [6.408072565019087]
Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions.
We show that an intuitive inverse trajectory pooling coefficient completely characterizes the percent reduction in mean-squared error of value estimates.
We prove that there can be dramatic improvements in estimates of the difference in value-to-go for two states.
arXiv Detail & Related papers (2023-01-30T21:02:25Z) - Future Gradient Descent for Adapting the Temporal Shifting Data
Distribution in Online Recommendation Systems [30.88268793277078]
We learn a meta future gradient generator that forecasts the gradient information of the future data distribution for training.
Compared with Batch Update, our theory suggests that the proposed algorithm achieves smaller temporal domain generalization error.
arXiv Detail & Related papers (2022-09-02T15:55:31Z) - Probabilistic forecasting for geosteering in fluvial successions using a
generative adversarial network [0.0]
Fast updates based on real-time data are essential when drilling in complex reservoirs with high uncertainties in pre-drill models.
We propose a generative adversarial deep neural network (GAN) trained to reproduce geologically consistent 2D sections of fluvial successions.
In our example, the method reduces uncertainty and correctly predicts most major geological features up to 500 meters ahead of drill-bit.
arXiv Detail & Related papers (2022-07-04T12:52:38Z) - Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms.
These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation.
We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Bridging the Gap Between Training and Inference for Spatio-Temporal
Forecasting [16.06369357595426]
We propose a novel curriculum learning based strategy named Temporal Progressive Growing Sampling to bridge the gap between training and inference for S-temporal sequence forecasting.
Experimental results demonstrate that our proposed method better models long term dependencies and outperforms baseline approaches on two competitive datasets.
arXiv Detail & Related papers (2020-05-19T10:14:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.