The distribution of syntactic dependency distances
- URL: http://arxiv.org/abs/2211.14620v1
- Date: Sat, 26 Nov 2022 17:31:25 GMT
- Title: The distribution of syntactic dependency distances
- Authors: Sonia Petrini and Ramon Ferrer-i-Cancho
- Abstract summary: We contribute to the characterization of the actual distribution of syntactic dependency distances.
We propose a new double-exponential model in which decay in probability is allowed to change after a break-point.
We find that a two-regime model is the most likely one in all 20 languages we considered.
- Score: 0.7614628596146599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The syntactic structure of a sentence can be represented as a graph where
vertices are words and edges indicate syntactic dependencies between them. In
this setting, the distance between two syntactically linked words can be
defined as the difference between their positions. Here we want to contribute
to the characterization of the actual distribution of syntactic dependency
distances, and unveil its relationship with short-term memory limitations. We
propose a new double-exponential model in which decay in probability is allowed
to change after a break-point. This transition could mirror the transition from
the processing of words chunks to higher-level structures. We find that a
two-regime model -- where the first regime follows either an exponential or a
power-law decay -- is the most likely one in all 20 languages we considered,
independently of sentence length and annotation style. Moreover, the
break-point is fairly stable across languages and averages values of 4-5 words,
suggesting that the amount of words that can be simultaneously processed
abstracts from the specific language to a high degree. Finally, we give an
account of the relation between the best estimated model and the closeness of
syntactic dependencies, as measured by a recently introduced optimality score.
Related papers
- Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Syntactic Substitutability as Unsupervised Dependency Syntax [31.488677474152794]
We model a more general property implicit in the definition of dependency relations, syntactic substitutability.
This property captures the fact that words at either end of a dependency can be substituted with words from the same category.
We show that increasing the number of substitutions used improves parsing accuracy on natural data.
arXiv Detail & Related papers (2022-11-29T09:01:37Z) - Universality and diversity in word patterns [0.0]
We present an analysis of lexical statistical connections for eleven major languages.
We find that the diverse manners that languages utilize to express word relations give rise to unique pattern distributions.
arXiv Detail & Related papers (2022-08-23T20:03:27Z) - Language Models Explain Word Reading Times Better Than Empirical
Predictability [20.38397241720963]
The traditional approach in cognitive reading research assumes that word predictability from sentence context is best captured by cloze completion probability.
Probability language models provide deeper explanations for syntactic and semantic effects than CCP.
N-gram and RNN probabilities of the present word more consistently predicted reading performance compared with topic models or CCP.
arXiv Detail & Related papers (2022-02-02T16:38:43Z) - NLP-CIC @ DIACR-Ita: POS and Neighbor Based Distributional Models for
Lexical Semantic Change in Diachronic Italian Corpora [62.997667081978825]
We present our systems and findings on unsupervised lexical semantic change for the Italian language.
The task is to determine whether a target word has evolved its meaning with time, only relying on raw-text from two time-specific datasets.
We propose two models representing the target words across the periods to predict the changing words using threshold and voting schemes.
arXiv Detail & Related papers (2020-11-07T11:27:18Z) - Boosting Continuous Sign Language Recognition via Cross Modality
Augmentation [135.30357113518127]
Continuous sign language recognition deals with unaligned video-text pair.
We propose a novel architecture with cross modality augmentation.
The proposed framework can be easily extended to other existing CTC based continuous SLR architectures.
arXiv Detail & Related papers (2020-10-11T15:07:50Z) - Multiplex Word Embeddings for Selectional Preference Acquisition [70.33531759861111]
We propose a multiplex word embedding model, which can be easily extended according to various relations among words.
Our model can effectively distinguish words with respect to different relations without introducing unnecessary sparseness.
arXiv Detail & Related papers (2020-01-09T04:47:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.