Related papers: Linear-time calculation of the expected sum of edge lengths in random projective linearizations of trees

Linear-time calculation of the expected sum of edge lengths in random projective linearizations of trees

URL: http://arxiv.org/abs/2107.03277v1
Date: Wed, 7 Jul 2021 15:11:53 GMT
Title: Linear-time calculation of the expected sum of edge lengths in random projective linearizations of trees
Authors: Llu\'is Alemany-Puig and Ramon Ferrer-i-Cancho
Abstract summary: The sum of distances between syntactically related words has been in the limelight for the past decades. Various random baselines have been defined to carry out related quantitative studies on languages. Here we focus on a popular baseline: random projective permutations of the words of the sentence.
Score: 1.2944868613449219
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines have been defined to carry out related quantitative studies on languages. The simplest random baseline is the expected value of the sum in unconstrained random permutations of the words in the sentence, namely when all the shufflings of the words of a sentence are allowed and equally likely. Here we focus on a popular baseline: random projective permutations of the words of the sentence, that is, permutations where the syntactic dependency structure is projective, a formal constraint that sentences satisfy often in languages. Thus far, the expectation of the sum of dependency distances in random projective shufflings of a sentence has been estimated approximately with a Monte Carlo procedure whose cost is of the order of $Zn$, where $n$ is the number of words of the sentence and $Z$ is the number of samples; the larger $Z$, the lower the error of the estimation but the larger the time cost. Here we present formulae to compute that expectation without error in time of the order of $n$. Furthermore, we show that star trees maximize it, and devise a dynamic programming algorithm to retrieve the trees that minimize it.

Related papers

Catoni Contextual Bandits are Robust to Heavy-tailed Rewards [31.381627608971414]
We develop an algorithmic approach building on Catoni's estimator from robust statistics. We establish a regret bound that depends only on the cumulative reward variance and logarithmically on the reward range $R$. Our algorithm also enjoys a variance-based bound with logarithmic reward-range dependence.
arXiv Detail & Related papers (2025-02-04T17:03:32Z)
Optimal level set estimation for non-parametric tournament and crowdsourcing problems [49.75262185577198]
Motivated by crowdsourcing, we consider a problem where we partially observe the correctness of the answers of $n$ experts on $d$ questions. In this paper, we assume that the matrix $M$ containing the probability that expert $i$ answers correctly to question $j$ is bi-isotonic up to a permutation of it rows and columns. We construct an efficient-time algorithm that turns out to be minimax optimal for this classification problem.
arXiv Detail & Related papers (2024-08-27T18:28:31Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Revisiting the Optimality of Word Lengths [92.70590105707639]
Communicative cost can be operationalized in different ways. Zipf (1935) posited that wordforms are optimized to minimize utterances' communicative costs.
arXiv Detail & Related papers (2023-12-06T20:41:47Z)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM) We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z)
The distribution of syntactic dependency distances [0.7614628596146599]
We contribute to the characterization of the actual distribution of syntactic dependency distances. We propose a new double-exponential model in which decay in probability is allowed to change after a break-point. We find that a two-regime model is the most likely one in all 20 languages we considered.
arXiv Detail & Related papers (2022-11-26T17:31:25Z)
Truncation Sampling as Language Model Desmoothing [115.28983143361681]
Long samples of text from neural language models can be of poor quality. Truncation sampling algorithms set some words' probabilities to zero at each step. We introduce $eta$-sampling, which truncates words below an entropy-dependent probability threshold.
arXiv Detail & Related papers (2022-10-27T05:52:35Z)
The expected sum of edge lengths in planar linearizations of trees. Theory and applications [0.16317061277456998]
We show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. We derive a $O(n)$-time algorithm to calculate the expected value of the sum of edge lengths. We apply this research to a parallel corpus and find that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases.
arXiv Detail & Related papers (2022-07-12T14:35:07Z)
Shallow decision trees for explainable $k$-means clustering [1.2891210250935146]
We propose an efficient algorithm that takes into account metrics related to the depths of the leaves in a decision tree. In experiments on 16 datasets, our algorithm yields better results than decision-tree clustering algorithms.
arXiv Detail & Related papers (2021-12-29T18:22:28Z)
Dependency distance minimization predicts compression [1.2944868613449219]
Dependency distance minimization (DDm) is a well-established principle of word order. This is a second order prediction because it links a principle with another principle, rather than a principle and a manifestation as in a first order prediction. We use a recently introduced score that has many mathematical and statistical advantages with respect to the widely used sum of dependency distances.
arXiv Detail & Related papers (2021-09-18T10:53:39Z)
List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions. Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset. A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z)
Estimating decision tree learnability with polylogarithmic sample complexity [16.832966312395126]
We show that top-down decision tree learnings are amenable to highly efficient learnability estimation. We show how the label $T(xstar$)$ of the decision tree hypothesis $T$ can be computed with polylogarithmically many labeled examples.
arXiv Detail & Related papers (2020-11-03T09:26:27Z)
Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively. In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.