Related papers: Marginal Flow: a flexible and efficient framework for density estimation

Marginal Flow: a flexible and efficient framework for density estimation

URL: http://arxiv.org/abs/2509.26221v1
Date: Tue, 30 Sep 2025 13:21:13 GMT
Title: Marginal Flow: a flexible and efficient framework for density estimation
Authors: Marcello Massimo Negri, Jonathan Aellen, Manuel Jahn, AmirEhsan Khorashadizadeh, Volker Roth,
Abstract summary: Current density modeling approaches suffer from at least one of the following shortcomings: expensive training, slow inference, approximate likelihood, mode collapse or architectural constraints.<n>We propose a simple yet powerful framework that overcomes these limitations altogether.<n>We define our model $q_theta(x)$ through a parametric distribution $q(x|w)$ with latent parameters $w$.<n>Instead of directly optimizing the latent variables $w$, our idea is to marginalize them out by sampling $w$ from a learnable distribution $q_theta(w)$, hence the name Marginal Flow
Score: 6.94175385834858
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current density modeling approaches suffer from at least one of the following shortcomings: expensive training, slow inference, approximate likelihood, mode collapse or architectural constraints like bijective mappings. We propose a simple yet powerful framework that overcomes these limitations altogether. We define our model $q_\theta(x)$ through a parametric distribution $q(x|w)$ with latent parameters $w$. Instead of directly optimizing the latent variables $w$, our idea is to marginalize them out by sampling $w$ from a learnable distribution $q_\theta(w)$, hence the name Marginal Flow. In order to evaluate the learned density $q_\theta(x)$ or to sample from it, we only need to draw samples from $q_\theta(w)$, which makes both operations efficient. The proposed model allows for exact density evaluation and is orders of magnitude faster than competing models both at training and inference. Furthermore, Marginal Flow is a flexible framework: it does not impose any restrictions on the neural network architecture, it enables learning distributions on lower-dimensional manifolds (either known or to be learned), it can be trained efficiently with any objective (e.g. forward and reverse KL divergence), and it easily handles multi-modal targets. We evaluate Marginal Flow extensively on various tasks including synthetic datasets, simulation-based inference, distributions on positive definite matrices and manifold learning in latent spaces of images.

Related papers

Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data [32.72306410557258]
We study the statistical convergence of score-based diffusion models for learning an unknown distribution $$ from finitely many samples.<n>Our results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data.<n>Our theory conceptually bridges the analysis of diffusion models with that of GANs and the sharp minimax rates established in optimal transport.
arXiv Detail & Related papers (2026-03-04T03:59:02Z)
DISCO: Diversifying Sample Condensation for Efficient Model Evaluation [59.01400190971061]
Costly evaluation reduces inclusivity, slows the cycle of innovation, and worsens environmental impact.<n>We argue that promoting diversity among samples is not essential; what matters is to select samples thatmaximise diversity in model responses.<n>Our method, $textbfDiversifying Sample Condensation (DISCO)$, selects the top-k samples with the greatest model disagreements.
arXiv Detail & Related papers (2025-10-09T08:53:59Z)
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ptimization [49.2338910653152]
Vision-constrained models (VLMs) have achieved remarkable success across diverse tasks by leveraging rich textual information with minimal labeled data.<n> Knowledge distillation (KD) offers a well-established solution to this problem; however, recent KD approaches from VLMs often involve multi-stage training or additional tuning.<n>We propose $mathbftextttDHO$ -- a simple yet effective KD framework that transfers knowledge from VLMs to compact, task-specific models in semi-language settings.
arXiv Detail & Related papers (2025-05-12T15:39:51Z)
Displacement-Sparse Neural Optimal Transport [6.968698312185846]
Optimal transport (OT) aims to find a map $T$ that transports mass from one probability measure to another while minimizing a cost function.<n>Neural OT solvers have gained popularity in high dimensional biological applications such as drug perturbation.<n>We propose an intuitive and theoretically grounded approach to learning emphdisplacement-sparse maps within neural OT solvers.
arXiv Detail & Related papers (2025-02-03T23:44:17Z)
A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models [45.60426164657739]
We develop non-asymptotic convergence theory for a diffusion-based sampler. We prove that $d/varepsilon$ are sufficient to approximate the target distribution to within $varepsilon$ total-variation distance. Our results also characterize how $ell$ score estimation errors affect the quality of the data generation processes.
arXiv Detail & Related papers (2024-08-05T09:02:24Z)
Variance Reduction for the Independent Metropolis Sampler [11.074080383657453]
We prove that if $pi$ is close enough under KL divergence to another density $q$, an independent sampler that obtains samples from $pi$ achieves smaller variance than i.i.d. sampling from $pi$. We propose an adaptive independent Metropolis algorithm that adapts the proposal density such that its KL divergence with the target is being reduced.
arXiv Detail & Related papers (2024-06-25T16:38:53Z)
Idempotent Generative Network [61.78905138698094]
We propose a new approach for generative modeling based on training a neural network to be idempotent. An idempotent operator is one that can be applied sequentially without changing the result beyond the initial application. We find that by processing inputs from both target and source distributions, the model adeptly projects corrupted or modified data back to the target manifold.
arXiv Detail & Related papers (2023-11-02T17:59:55Z)
Towards Understanding and Improving GFlowNet Training [71.85707593318297]
We introduce an efficient evaluation strategy to compare the learned sampling distribution to the target reward distribution. We propose prioritized replay training of high-reward $x$, relative edge flow policy parametrization, and a novel guided trajectory balance objective.
arXiv Detail & Related papers (2023-05-11T22:50:41Z)
Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. $nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective. We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z)
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error. We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator) We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$. We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.