Learning from Samples: Inverse Problems over measures via Sharpened Fenchel-Young Losses
- URL: http://arxiv.org/abs/2505.07124v1
- Date: Sun, 11 May 2025 21:26:44 GMT
- Title: Learning from Samples: Inverse Problems over measures via Sharpened Fenchel-Young Losses
- Authors: Francisco Andrade, Gabriel Peyré, Clarice Poon,
- Abstract summary: Estimating parameters from samples of an optimal probability distribution is essential in applications ranging from socio-economic modeling to biological system analysis.<n>Our approach relies on minimizing a new class of loss functions, called sharpened Fenchel-Young losses.<n>We study the stability of this estimation method when only a finite number of sample is available.
- Score: 20.246040671823557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating parameters from samples of an optimal probability distribution is essential in applications ranging from socio-economic modeling to biological system analysis. In these settings, the probability distribution arises as the solution to an optimization problem that captures either static interactions among agents or the dynamic evolution of a system over time. Our approach relies on minimizing a new class of loss functions, called sharpened Fenchel-Young losses, which measure the sub-optimality gap of the optimization problem over the space of measures. We study the stability of this estimation method when only a finite number of sample is available. The parameters to be estimated typically correspond to a cost function in static problems and to a potential function in dynamic problems. To analyze stability, we introduce a general methodology that leverages the strong convexity of the loss function together with the sample complexity of the forward optimization problem. Our analysis emphasizes two specific settings in the context of optimal transport, where our method provides explicit stability guarantees: The first is inverse unbalanced optimal transport (iUOT) with entropic regularization, where the parameters to estimate are cost functions that govern transport computations; this method has applications such as link prediction in machine learning. The second is inverse gradient flow (iJKO), where the objective is to recover a potential function that drives the evolution of a probability distribution via the Jordan-Kinderlehrer-Otto (JKO) time-discretization scheme; this is particularly relevant for understanding cell population dynamics in single-cell genomics. Finally, we validate our approach through numerical experiments on Gaussian distributions, where closed-form solutions are available, to demonstrate the practical performance of our methods
Related papers
- Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two.<n>Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z) - Semiparametric Counterfactual Regression [2.356908851188234]
We propose a doubly robust-style estimator for counterfactual regression within a generalizable framework.<n>Our approach uses incremental interventions to enhance adaptability while maintaining with standard methods.<n>Our analysis shows that the proposed estimators can achieve $sqrn$-consistency and normality for a broad class of problems.
arXiv Detail & Related papers (2025-04-03T15:32:26Z) - On improving generalization in a class of learning problems with the method of small parameters for weakly-controlled optimal gradient systems [0.0]
We consider a variational problem for a weakly-controlled gradient system, whose control input enters into the system dynamics as a coefficient to a nonlinear term.<n>Using the perturbation theory, we provide results that will allow us to solve a sequence of optimization problems.<n>We also provide an estimate for the rate of convergence for such approximate optimal solutions.
arXiv Detail & Related papers (2024-12-11T20:50:29Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [61.580419063416734]
A recent stream of structured learning approaches has improved the practical state of the art for a range of optimization problems.
The key idea is to exploit the statistical distribution over instances instead of dealing with instances separately.
In this article, we investigate methods that smooth the risk by perturbing the policy, which eases optimization and improves the generalization error.
arXiv Detail & Related papers (2024-07-24T12:00:30Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - A framework for bilevel optimization that enables stochastic and global variance reduction algorithms [21.67411847762289]
Bilevel optimization is a problem of minimizing a value function which involves the arg-minimum of another function.<n>We introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time.<n>We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(frac1T)$ convergence rate.
arXiv Detail & Related papers (2022-01-31T18:17:25Z) - Integrated Conditional Estimation-Optimization [6.037383467521294]
Many real-world optimization problems uncertain parameters with probability can be estimated using contextual feature information.
In contrast to the standard approach of estimating the distribution of uncertain parameters, we propose an integrated conditional estimation approach.
We show that our ICEO approach is theally consistent under moderate conditions.
arXiv Detail & Related papers (2021-10-24T04:49:35Z) - Semi-Discrete Optimal Transport: Hardness, Regularization and Numerical
Solution [8.465228064780748]
We prove that computing the Wasserstein distance between a discrete probability measure supported on two points is already #P-hard.
We introduce a distributionally robust dual optimal transport problem whose objective function is smoothed with the most adverse disturbance distributions.
We show that smoothing the dual objective function is equivalent to regularizing the primal objective function.
arXiv Detail & Related papers (2021-03-10T18:53:59Z) - Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization [106.70006655990176]
A distributional optimization problem arises widely in machine learning and statistics.
We propose a novel particle-based algorithm, dubbed as variational transport, which approximately performs Wasserstein gradient descent.
We prove that when the objective function satisfies a functional version of the Polyak-Lojasiewicz (PL) (Polyak, 1963) and smoothness conditions, variational transport converges linearly.
arXiv Detail & Related papers (2020-12-21T18:33:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.