Algorithms for Acyclic Weighted Finite-State Automata with Failure Arcs
- URL: http://arxiv.org/abs/2301.06862v2
- Date: Tue, 11 Jul 2023 09:08:33 GMT
- Title: Algorithms for Acyclic Weighted Finite-State Automata with Failure Arcs
- Authors: Anej Svete, Benjamin Dayan, Tim Vieira, Ryan Cotterell, Jason Eisner
- Abstract summary: We propose an algorithm for general acyclic WFSAs which runs in $Oleft(|E| + s |Sigma| |Q| T_textmax log|Sigma|right)$.
When the failure transition topology satisfies a condition exemplified by CRFs, the $T_textmax$ factor can be dropped.
In the latter case (ring-weighted acyclic WFSAs), we also give an alternative algorithm complexity with $style Oleft(|E| + |Sigma| |
- Score: 66.47284608209692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weighted finite-state automata (WSFAs) are commonly used in NLP. Failure
transitions are a useful extension for compactly representing backoffs or
interpolation in $n$-gram models and CRFs, which are special cases of WFSAs.
The pathsum in ordinary acyclic WFSAs is efficiently computed by the backward
algorithm in time $O(|E|)$, where $E$ is the set of transitions. However, this
does not allow failure transitions, and preprocessing the WFSA to eliminate
failure transitions could greatly increase $|E|$. We extend the backward
algorithm to handle failure transitions directly. Our approach is efficient
when the average state has outgoing arcs for only a small fraction $s \ll 1$ of
the alphabet $\Sigma$. We propose an algorithm for general acyclic WFSAs which
runs in $O{\left(|E| + s |\Sigma| |Q| T_\text{max} \log{|\Sigma|}\right)}$,
where $Q$ is the set of states and $T_\text{max}$ is the size of the largest
connected component of failure transitions. When the failure transition
topology satisfies a condition exemplified by CRFs, the $T_\text{max}$ factor
can be dropped, and when the weight semiring is a ring, the $\log{|\Sigma|}$
factor can be dropped. In the latter case (ring-weighted acyclic WFSAs), we
also give an alternative algorithm with complexity $\displaystyle O{\left(|E| +
|\Sigma| |Q| \min(1,s\pi_\text{max}) \right)}$, where $\pi_\text{max}$ is the
size of the longest failure path.
Related papers
- Cryptography from Lossy Reductions: Towards OWFs from ETH, and Beyond [1.0687104237121408]
One-way functions (OWFs) form the foundation of modern cryptography, yet their unconditional existence remains a major open question.<n>We show that either OWFs exist or any lossy reduction for any promise problem $Pi$ runs in time $2Omega(logtau_Pi / loglog n)$.
arXiv Detail & Related papers (2025-05-27T17:15:30Z) - More-efficient Quantum Multivariate Mean Value Estimator from Generalized Grover Gate [0.0]
We find an algorithm that uses $Oleft(n log fracddeltaright)$ samples to find a mean estimate that $vectildemu$.
Our result remains not exactly optimal due to a $log fracddelta$ term in our complexity.
arXiv Detail & Related papers (2025-04-09T14:48:23Z) - Improved Robust Estimation for Erdős-Rényi Graphs: The Sparse Regime and Optimal Breakdown Point [3.793609515750114]
We study the problem of robustly estimating the edge density of ErdHos-R'enyi random graphs $G(n, dcirc/n)$.
Our algorithm is based on the sum-of-squares hierarchy.
arXiv Detail & Related papers (2025-03-05T21:45:17Z) - Measuring quantum relative entropy with finite-size effect [53.64687146666141]
We study the estimation of relative entropy $D(rho|sigma)$ when $sigma$ is known.<n>Our estimator attains the Cram'er-Rao type bound when the dimension $d$ is fixed.
arXiv Detail & Related papers (2024-06-25T06:07:20Z) - Revisiting Step-Size Assumptions in Stochastic Approximation [1.3654846342364308]
It is shown for the first time that this assumption is not required for convergence and finer results.<n>Rates of convergence are obtained for the standard algorithm and for estimates obtained via the averaging technique of Polyak and Ruppert.<n>Results from numerical experiments illustrate that $beta_theta$ may be large due to a combination of multiplicative noise and Markovian memory.
arXiv Detail & Related papers (2024-05-28T05:11:05Z) - Solving Dense Linear Systems Faster Than via Preconditioning [1.8854491183340518]
We show that our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$.
In particular, our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$.
Our main algorithm can be viewed as a randomized block coordinate descent method.
arXiv Detail & Related papers (2023-12-14T12:53:34Z) - Distribution-Independent Regression for Generalized Linear Models with
Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise.
We present an algorithm that tackles newthis problem in its most general distribution-independent setting.
This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Memory-Constrained Algorithms for Convex Optimization via Recursive
Cutting-Planes [23.94542304111204]
First class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $epsilonleq 1/sqrt d$.
In the regime $epsilon leq d-Omega(d)$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
arXiv Detail & Related papers (2023-06-16T17:00:51Z) - Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces [2.687607197645453]
We show that even the special case of $k$-Center in dimension $Theta(log n)$ is $(sqrt3/2- o(1))$hard to approximate for FPT algorithms.
We also show that even the special case of $k$-Center in dimension $Theta(log n)$ is $(sqrt3/2- o(1))$hard to approximate for FPT algorithms.
arXiv Detail & Related papers (2023-05-12T08:43:28Z) - Eluder-based Regret for Stochastic Contextual MDPs [43.19667415823089]
We present the E-UC$3$RL algorithm for regret minimization in Contextual Markov Decision Processes (CMDPs)
Our algorithm is efficient (assuming efficient offline regression oracles) and enjoys a regret guarantee of $ widetildeO(H3 sqrtT |S| |A|d_mathrmE(mathcalP)$.
arXiv Detail & Related papers (2022-11-27T20:38:47Z) - Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
We study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches.
We design a computational efficient algorithm to achieve near-optimal regret of $tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot) hides logarithmic terms of $(S,A,H,K)$ in $K$ episodes.
Our technical contribution are two-fold: 1) a near-optimal design scheme to explore
arXiv Detail & Related papers (2022-10-15T09:22:22Z) - Faster Sampling from Log-Concave Distributions over Polytopes via a
Soft-Threshold Dikin Walk [28.431572772564518]
We consider the problem of sampling from a $d$-dimensional log-concave distribution $pi(theta) propto e-f(theta)$ constrained to a polytope $K$ defined by $m$ inequalities.
Our main result is a "soft-warm'' variant of the Dikin walk Markov chain that requires at most $O((md + d L2 R2) times MDomega-1) log(fracwdelta)$ arithmetic operations to sample from $pi$
arXiv Detail & Related papers (2022-06-19T11:33:07Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Faster quantum-inspired algorithms for solving linear systems [0.05076419064097732]
We show that there is a classical algorithm that outputs a data structure for $x$ allowing sampling and querying to the entries.
This output can be viewed as a classical analogue to the output of quantum linear solvers.
arXiv Detail & Related papers (2021-03-18T15:12:44Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.