Related papers: Flow Models for Unbounded and Geometry-Aware Distributional Reinforcement Learning

Flow Models for Unbounded and Geometry-Aware Distributional Reinforcement Learning

URL: http://arxiv.org/abs/2505.04310v1
Date: Wed, 07 May 2025 10:49:53 GMT
Title: Flow Models for Unbounded and Geometry-Aware Distributional Reinforcement Learning
Authors: Simo Alami C., Rim Kaddah, Jesse Read, Marie-Paule Cani,
Abstract summary: We introduce a new architecture for Distributional Reinforcement Learning (DistRL) that models return distributions using normalizing flows.<n>This approach enables flexible, unbounded support for return distributions.<n>We show that our approach outperforms PDF based models while remaining competitive with quantile based methods.
Score: 5.628557798049032
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a new architecture for Distributional Reinforcement Learning (DistRL) that models return distributions using normalizing flows. This approach enables flexible, unbounded support for return distributions, in contrast to categorical approaches like C51 that rely on fixed or bounded representations. It also offers richer modeling capacity to capture multi-modality, skewness, and tail behavior than quantile based approaches. Our method is significantly more parameter-efficient than categorical approaches. Standard metrics used to train existing models like KL divergence or Wasserstein distance either are scale insensitive or have biased sample gradients, especially when return supports do not overlap. To address this, we propose a novel surrogate for the Cram\`er distance, that is geometry-aware and computable directly from the return distribution's PDF, avoiding the costly CDF computation. We test our model on the ATARI-5 sub-benchmark and show that our approach outperforms PDF based models while remaining competitive with quantile based methods.

Related papers

Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts [64.34482582690927]
We provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models.<n>We propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality.
arXiv Detail & Related papers (2025-03-04T17:46:51Z)
Aggregation Weighting of Federated Learning via Generalization Bound Estimation [65.8630966842025]
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions. We replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model.
arXiv Detail & Related papers (2023-11-10T08:50:28Z)
An iterative multi-fidelity approach for model order reduction of multi-dimensional input parametric PDE systems [0.0]
We propose a sampling parametric strategy for the reduction of large-scale PDE systems with multidimensional input parametric spaces. It is achieved by exploiting low-fidelity models throughout the parametric space to sample points using an efficient sampling strategy. Since the proposed methodology leverages the use of low-fidelity models to assimilate the solution database, it significantly reduces the computational cost in the offline stage.
arXiv Detail & Related papers (2023-01-23T15:25:58Z)
Online Probabilistic Model Identification using Adaptive Recursive MCMC [8.465242072268019]
We suggest the Adaptive Recursive Markov Chain Monte Carlo (ARMCMC) method. It eliminates the shortcomings of conventional online techniques while computing the entire probability density function of model parameters. We demonstrate our approach using parameter estimation in a soft bending actuator and the Hunt-Crossley dynamic model.
arXiv Detail & Related papers (2022-10-23T02:06:48Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
Merging Models with Fisher-Weighted Averaging [24.698591753644077]
We introduce a fundamentally different method for transferring knowledge across models that amounts to "merging" multiple models into one. Our approach effectively involves computing a weighted average of the models' parameters. We show that our merging procedure makes it possible to combine models in previously unexplored ways.
arXiv Detail & Related papers (2021-11-18T17:59:35Z)
Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores) For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training. We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z)
Probabilistic Circuits for Variational Inference in Discrete Graphical Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult. Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO) We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN) We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z)
Haar Wavelet based Block Autoregressive Flows for Trajectories [129.37479472754083]
Prediction of trajectories such as that of pedestrians is crucial to the performance of autonomous agents. We introduce a novel Haar wavelet based block autoregressive model leveraging split couplings. We illustrate the advantages of our approach for generating diverse and accurate trajectories on two real-world datasets.
arXiv Detail & Related papers (2020-09-21T13:57:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.