Related papers: Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures

Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures

URL: http://arxiv.org/abs/2503.10576v1
Date: Thu, 13 Mar 2025 17:28:44 GMT
Title: Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures
Authors: Nina Vesseron, Louis Béthune, Marco Cuturi,
Abstract summary: A common approach to generative modeling is to split model-fitting into two blocks: define first how to sample noise (e.g. Gaussian) and choose next what to do with it.<n>We explore in this alternative route that sampling and mapping.<n>We find inspiration in moment measures, a result that states for any measure $mathbbRd$ there exists a unique potential $u$rho that $rho=nabla u,sharp,e-u$.
Score: 22.7776491836979
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A common approach to generative modeling is to split model-fitting into two blocks: define first how to sample noise (e.g. Gaussian) and choose next what to do with it (e.g. using a single map or flows). We explore in this work an alternative route that ties sampling and mapping. We find inspiration in moment measures, a result that states that for any measure $\rho$ supported on a compact convex set of $\mathbb{R}^d$, there exists a unique convex potential $u$ such that $\rho=\nabla u\,\sharp\,e^{-u}$. While this does seem to tie effectively sampling (from log-concave distribution $e^{-u}$) and action (pushing particles through $\nabla u$), we observe on simple examples (e.g., Gaussians or 1D distributions) that this choice is ill-suited for practical tasks. We study an alternative factorization, where $\rho$ is factorized as $\nabla w^*\,\sharp\,e^{-w}$, where $w^*$ is the convex conjugate of $w$. We call this approach conjugate moment measures, and show far more intuitive results on these examples. Because $\nabla w^*$ is the Monge map between the log-concave distribution $e^{-w}$ and $\rho$, we rely on optimal transport solvers to propose an algorithm to recover $w$ from samples of $\rho$, and parameterize $w$ as an input-convex neural network.

Related papers

Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models [65.71506381302815]
We propose amortize the cost of sampling from a posterior distribution of the form $p(mathbfxmidmathbfy) propto p_theta(mathbfx)$. For many models and constraints of interest, the posterior in the noise space is smoother than the posterior in the data space, making it more amenable to such amortized inference.
arXiv Detail & Related papers (2025-02-10T19:49:54Z)
Debiasing and a local analysis for population clustering using semidefinite programming [1.9761774213809036]
We consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of $2$ sub-gaussian distributions. This work is motivated by the application of clustering individuals according to their population of origin.
arXiv Detail & Related papers (2024-01-16T03:14:24Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
Online Learning with Adversaries: A Differential-Inclusion Analysis [52.43460995467893]
We introduce an observation-matrix-based framework for fully asynchronous online Federated Learning with adversaries. Our main result is that the proposed algorithm almost surely converges to the desired mean $mu.$ We derive this convergence using a novel differential-inclusion-based two-timescale analysis.
arXiv Detail & Related papers (2023-04-04T04:32:29Z)
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions. To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits. In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Tight Bounds on the Hardness of Learning Simple Nonparametric Mixtures [9.053430799456587]
We study the problem of learning nonparametric distributions in a finite mixture. We establish tight bounds on the sample complexity for learning the component distributions in such models.
arXiv Detail & Related papers (2022-03-28T23:53:48Z)
Data fission: splitting a single data point [27.500860533521713]
In this paper, we offer a more general methodology for achieving such a split in finite samples. We call our method data fission, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few applications, such as post-selection inference for trend filtering and other regression problems.
arXiv Detail & Related papers (2021-12-21T10:27:04Z)
Fast Graph Sampling for Short Video Summarization using Gershgorin Disc Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling. Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z)
Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model [3.749193647980305]
This paper considers a Markov decision process (MDP) that admits a set of state-action features. We show that a model-based approach (resp.$$Q-learning) provably learns an $varepsilon$-optimal policy with high probability.
arXiv Detail & Related papers (2021-05-28T17:49:39Z)
Structured Logconcave Sampling with a Restricted Gaussian Oracle [23.781520510778716]
We give algorithms for sampling several structured logconcave families to high accuracy. We further develop a reduction framework, inspired by proximal point methods in convex optimization.
arXiv Detail & Related papers (2020-10-07T01:43:07Z)
Efficient Statistics for Sparse Graphical Models from Truncated Samples [19.205541380535397]
We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models. For sparse linear regression, suppose samples $(bf x,y)$ are generated where $y = bf xtopOmega* + mathcalN(0,1)$ and $(bf x, y)$ is seen only if $y$ belongs to a truncation set $S subseteq mathbbRd$.
arXiv Detail & Related papers (2020-06-17T09:21:00Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy. For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
A Randomized Algorithm to Reduce the Support of Discrete Measures [79.55586575988292]
Given a discrete probability measure supported on $N$ atoms and a set of $n$ real-valued functions, there exists a probability measure that is supported on a subset of $n+1$ of the original $N$ atoms. We give a simple geometric characterization of barycenters via negative cones and derive a randomized algorithm that computes this new measure by "greedy geometric sampling" We then study its properties, and benchmark it on synthetic and real-world data to show that it can be very beneficial in the $Ngg n$ regime.
arXiv Detail & Related papers (2020-06-02T16:38:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.