Fast Minimization of Expected Logarithmic Loss via Stochastic Dual
Averaging
- URL: http://arxiv.org/abs/2311.02557v2
- Date: Mon, 11 Mar 2024 10:44:54 GMT
- Title: Fast Minimization of Expected Logarithmic Loss via Stochastic Dual
Averaging
- Authors: Chung-En Tsai and Hao-Chung Cheng and Yen-Huan Li
- Abstract summary: We propose a first-order algorithm named $B$-sample dual averaging with the logarithmic barrier.
For the Poisson inverse problem, our algorithm attains an $varepsilon$ solution in $smashtildeO(d3/varepsilon2)$ time.
When computing the maximum-likelihood estimate for quantum state tomography, our algorithm yields an $varepsilon$-optimal solution in $smashtildeO(d3/varepsilon2)$ time.
- Score: 8.990961435218544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider the problem of minimizing an expected logarithmic loss over either
the probability simplex or the set of quantum density matrices. This problem
includes tasks such as solving the Poisson inverse problem, computing the
maximum-likelihood estimate for quantum state tomography, and approximating
positive semi-definite matrix permanents with the currently tightest
approximation ratio. Although the optimization problem is convex, standard
iteration complexity guarantees for first-order methods do not directly apply
due to the absence of Lipschitz continuity and smoothness in the loss function.
In this work, we propose a stochastic first-order algorithm named $B$-sample
stochastic dual averaging with the logarithmic barrier. For the Poisson inverse
problem, our algorithm attains an $\varepsilon$-optimal solution in
$\smash{\tilde{O}}(d^2/\varepsilon^2)$ time, matching the state of the art,
where $d$ denotes the dimension. When computing the maximum-likelihood estimate
for quantum state tomography, our algorithm yields an $\varepsilon$-optimal
solution in $\smash{\tilde{O}}(d^3/\varepsilon^2)$ time. This improves on the
time complexities of existing stochastic first-order methods by a factor of
$d^{\omega-2}$ and those of batch methods by a factor of $d^2$, where $\omega$
denotes the matrix multiplication exponent. Numerical experiments demonstrate
that empirically, our algorithm outperforms existing methods with explicit
complexity guarantees.
Related papers
- Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Fine-grained Analysis and Faster Algorithms for Iteratively Solving Linear Systems [9.30306458153248]
We consider the spectral tail condition number, $kappa_ell$, defined as the ratio between the $ell$th largest and the smallest singular value of the matrix representing the system.
Some of the implications of our result, and of the use of $kappa_ell$, include direct improvement over a fine-grained analysis of the Conjugate method.
arXiv Detail & Related papers (2024-05-09T14:56:49Z) - Near-Optimal Algorithms for Gaussians with Huber Contamination: Mean
Estimation and Linear Regression [44.13655983242414]
We design the first sample near- and almost linear-time algorithms with optimal error guarantees.
For robust linear regression, we give the first algorithm with sample complexity $n = tildeO(d/epsilon2)$ and almost linear runtime that approximates the target regressor within $ell$- $O(epsilon)$.
This is the first sample and time algorithm achieving the optimal error guarantee, answering an open question in the literature.
arXiv Detail & Related papers (2023-12-04T00:31:16Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Deterministic Nonsmooth Nonconvex Optimization [94.01526844386977]
We show that randomization is necessary to obtain a dimension-free dimension-free algorithm.
Our algorithm yields the first deterministic dimension-free algorithm for optimizing ReLU networks.
arXiv Detail & Related papers (2023-02-16T13:57:19Z) - Faster Stochastic First-Order Method for Maximum-Likelihood Quantum
State Tomography [10.758021887982782]
In quantum state tomography, the sample size and dimension grow exponentially with the number of qubits.
We propose an algorithm called mirror descent with the Burg-likelihood entropy.
To the best of our knowledge, this is currently the fastest, first-order method for maximum-likelihood quantum state tomography.
arXiv Detail & Related papers (2022-11-23T11:35:47Z) - Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - A stochastic linearized proximal method of multipliers for convex
stochastic optimization with expectation constraints [8.133190610747974]
We present a computable approximation type algorithm, namely the linearized proximal convex method of multipliers.
Some preliminary numerical results demonstrate the performance of the proposed algorithm.
arXiv Detail & Related papers (2021-06-22T07:24:17Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.