Related papers: Higher-order Derivatives of Weighted Finite-state Machines

Higher-order Derivatives of Weighted Finite-state Machines

URL: http://arxiv.org/abs/2106.00749v2
Date: Wed, 27 Sep 2023 18:02:34 GMT
Title: Higher-order Derivatives of Weighted Finite-state Machines
Authors: Ran Zmigrod, Tim Vieira, Ryan Cotterell
Abstract summary: This work examines the computation of higher-order derivatives with respect to the normalization constant for weighted finite-state machines. We provide a general algorithm for evaluating derivatives of all orders, which has not been previously described in the literature. Our algorithm is significantly faster than prior algorithms.
Score: 68.43084108204741
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Weighted finite-state machines are a fundamental building block of NLP systems. They have withstood the test of time -- from their early use in noisy channel models in the 1990s up to modern-day neurally parameterized conditional random fields. This work examines the computation of higher-order derivatives with respect to the normalization constant for weighted finite-state machines. We provide a general algorithm for evaluating derivatives of all orders, which has not been previously described in the literature. In the case of second-order derivatives, our scheme runs in the optimal $\mathcal{O}(A^2 N^4)$ time where $A$ is the alphabet size and $N$ is the number of states. Our algorithm is significantly faster than prior algorithms. Additionally, our approach leads to a significantly faster algorithm for computing second-order expectations, such as covariance matrices and gradients of first-order expectations.

Related papers

Quantum algorithm for the gradient of a logarithm-determinant [0.0]
The inverse of a sparse-rank input operator may be determined efficiently. Measuring an expectation value of the quantum state--instead of all $N2$ elements of the input operator--can be accomplished in $O(ksigma)$ time. The algorithm is envisioned for fully error-corrected quantum computers but may be implementable on near-term machines.
arXiv Detail & Related papers (2025-01-16T09:39:31Z)
Fast Minimization of Expected Logarithmic Loss via Stochastic Dual Averaging [8.990961435218544]
We propose a first-order algorithm named $B$-sample dual averaging with the logarithmic barrier. For the Poisson inverse problem, our algorithm attains an $varepsilon$ solution in $smashtildeO(d3/varepsilon2)$ time. When computing the maximum-likelihood estimate for quantum state tomography, our algorithm yields an $varepsilon$-optimal solution in $smashtildeO(d3/varepsilon2)$ time.
arXiv Detail & Related papers (2023-11-05T03:33:44Z)
On adaptive low-depth quantum algorithms for robust multiple-phase estimation [11.678822620192438]
We present robust multiple-phase estimation (RMPE) algorithms with Heisenberg-limited scaling. These algorithms are particularly suitable for early fault-tolerant quantum computers.
arXiv Detail & Related papers (2023-03-14T17:38:01Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z)
Improved Rate of First Order Algorithms for Entropic Optimal Transport [2.1485350418225244]
This paper improves the state-of-the-art rate of a first-order algorithm for solving entropy regularized optimal transport. We propose an accelerated primal-dual mirror descent algorithm with variance reduction. Our algorithm may inspire more research to develop accelerated primal-dual algorithms that have rate $widetildeO(n2/epsilon)$ for solving OT.
arXiv Detail & Related papers (2023-01-23T19:13:25Z)
Private estimation algorithms for stochastic block models and mixture models [63.07482515700984]
General tools for designing efficient private estimation algorithms. First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z)
Quantum Algorithms for Sampling Log-Concave Distributions and Estimating Normalizing Constants [8.453228628258778]
We develop quantum algorithms for sampling logconcave distributions and for estimating their normalizing constants. We exploit quantum analogs of the Monte Carlo method and quantum walks. We also prove a $1/epsilon1-o(1)$ quantum lower bound for estimating normalizing constants.
arXiv Detail & Related papers (2022-10-12T19:10:43Z)
Scheduling with Speed Predictions [10.217102227210669]
We study the speed-robust scheduling problem where the speeds of the machines, instead of the processing times of the jobs, are unknown. Our main result is an algorithm that achieves a $mineta2 (1+epsilon)2 (1+epsilon)(2 + 2/alpha)$ approximation. When the predictions are accurate, this approximation improves over the previously best known approximation of $2-1/m$ for speed-robust scheduling.
arXiv Detail & Related papers (2022-05-02T23:39:41Z)
Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications. We propose two new algorithms for bilevel optimization. We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z)
Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product. We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.