Efficient Sampling of Dependency Structures
- URL: http://arxiv.org/abs/2109.06521v1
- Date: Tue, 14 Sep 2021 08:33:12 GMT
- Title: Efficient Sampling of Dependency Structures
- Authors: Ran Zmigrod, Tim Vieira, Ryan Cotterell
- Abstract summary: We adapt two spanning tree sampling algorithms to faithfully sample dependency trees from a graph subject to a root constraint.
We build upon Colbourn's algorithm and present a novel extension that can sample $K$ trees without replacement in $mathcalO(K N3 + K2 N)$ time.
- Score: 39.5549669100436
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Probabilistic distributions over spanning trees in directed graphs are a
fundamental model of dependency structure in natural language processing,
syntactic dependency trees. In NLP, dependency trees often have an additional
root constraint: only one edge may emanate from the root. However, no sampling
algorithm has been presented in the literature to account for this additional
constraint. In this paper, we adapt two spanning tree sampling algorithms to
faithfully sample dependency trees from a graph subject to the root constraint.
Wilson (1996)'s sampling algorithm has a running time of $\mathcal{O}(H)$ where
$H$ is the mean hitting time of the graph. Colbourn (1996)'s sampling algorithm
has a running time of $\mathcal{O}(N^3)$, which is often greater than the mean
hitting time of a directed graph. Additionally, we build upon Colbourn's
algorithm and present a novel extension that can sample $K$ trees without
replacement in $\mathcal{O}(K N^3 + K^2 N)$ time. To the best of our knowledge,
no algorithm has been given for sampling spanning trees without replacement
from a directed graph.
Related papers
- On Computing Optimal Tree Ensembles [7.424944196676223]
Random forests and, more generally, (decisionnobreakdash-)tree ensembles are widely used methods for classification and regression.
Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth.
We provide two novel algorithms and corresponding lower bounds.
arXiv Detail & Related papers (2023-06-07T13:30:43Z) - Detection-Recovery Gap for Planted Dense Cycles [72.4451045270967]
We consider a model where a dense cycle with expected bandwidth $n tau$ and edge density $p$ is planted in an ErdHos-R'enyi graph $G(n,q)$.
We characterize the computational thresholds for the associated detection and recovery problems for the class of low-degree algorithms.
arXiv Detail & Related papers (2023-02-13T22:51:07Z) - Random graph matching at Otter's threshold via counting chandeliers [16.512416293014493]
We propose an efficient algorithm for graph matching based on similarity scores constructed from counting a certain family of weighted trees rooted at each.
This is the first graph matching algorithm that succeeds at an explicit constant correlation and applies to both sparse and dense graphs.
arXiv Detail & Related papers (2022-09-25T20:00:28Z) - Unbiased and Efficient Sampling of Dependency Trees [0.0]
Most treebanks require that every valid dependency tree has a single edge coming out of the ROOT node.
Zmigrod et al. have recently proposed algorithms for sampling with and without replacement from the single-root dependency tree distribution.
We show that their fastest algorithm for sampling with replacement, Wilson-RC, is in fact biased.
arXiv Detail & Related papers (2022-05-25T09:57:28Z) - Active-LATHE: An Active Learning Algorithm for Boosting the Error
Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$.
Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z) - Robustifying Algorithms of Learning Latent Trees with Vector Variables [92.18777020401484]
We present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG)
We robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product.
We derive the first known instance-dependent impossibility result for structure learning of latent trees.
arXiv Detail & Related papers (2021-06-02T01:37:52Z) - On Finding the $K$-best Non-projective Dependency Trees [39.5549669100436]
We provide a simplification of the $K$-best spanning tree algorithm of Camerini et al. (1980).
We present a novel extension of the algorithm for decoding the $K$-best dependency trees of a graph which are subject to a root constraint.
arXiv Detail & Related papers (2021-06-01T20:23:41Z) - SGA: A Robust Algorithm for Partial Recovery of Tree-Structured
Graphical Models with Noisy Samples [75.32013242448151]
We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise.
Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure.
We propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery.
arXiv Detail & Related papers (2021-01-22T01:57:35Z) - Please Mind the Root: Decoding Arborescences for Dependency Parsing [67.71280539312536]
We analyze the output of state-of-the-arts on many languages from the Universal Dependency Treebank.
The worst constraint-violation rate we observe is 24%.
arXiv Detail & Related papers (2020-10-06T08:31:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.