Syntactic Language Change in English and German: Metrics, Parsers, and Convergences
- URL: http://arxiv.org/abs/2402.11549v2
- Date: Thu, 28 Mar 2024 11:16:28 GMT
- Title: Syntactic Language Change in English and German: Metrics, Parsers, and Convergences
- Authors: Yanran Chen, Wei Zhao, Anne Breitbarth, Manuel Stoeckel, Alexander Mehler, Steffen Eger,
- Abstract summary: The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
- Score: 56.47832275431858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many studies have shown that human languages tend to optimize for lower complexity and increased communication efficiency. Syntactic dependency distance, which measures the linear distance between dependent words, is often considered a key indicator of language processing difficulty and working memory load. The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years. We base our observations on five dependency parsers, including the widely used Stanford CoreNLP as well as 4 newer alternatives. Our analysis of syntactic language change goes beyond linear dependency distance and explores 15 metrics relevant to dependency distance minimization (DDM) and/or based on tree graph properties, such as the tree height and degree variance. Even though we have evidence that recent parsers trained on modern treebanks are not heavily affected by data 'noise' such as spelling changes and OCR errors in our historic data, we find that results of syntactic language change are sensitive to the parsers involved, which is a caution against using a single parser for evaluating syntactic language change as done in previous work. We also show that syntactic language change over the time period investigated is largely similar between English and German for the different metrics explored: only 4% of cases we examine yield opposite conclusions regarding upwards and downtrends of syntactic metrics across German and English. We also show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions. To our best knowledge, ours is the most comprehensive analysis of syntactic language change using modern NLP technology in recent corpora of English and German.
Related papers
- Integrating Supertag Features into Neural Discontinuous Constituent Parsing [0.0]
Traditional views of constituency demand that constituents consist of adjacent words, common in languages like German.
Transition-based parsing produces trees given raw text input using supervised learning on large annotated corpora.
arXiv Detail & Related papers (2024-10-11T12:28:26Z) - Correlation Does Not Imply Compensation: Complexity and Irregularity in the Lexicon [48.00488140516432]
We find evidence of a positive relationship between morphological irregularity and phonotactic complexity within languages.
We also find weak evidence of a negative relationship between word length and morphological irregularity.
arXiv Detail & Related papers (2024-06-07T18:09:21Z) - Multipath parsing in the brain [4.605070569473395]
Humans understand sentences word-by-word, in the order that they hear them.
We investigate how humans process these syntactic ambiguities by correlating predictions from incremental dependencys with timecourse data from people undergoing functional neuroimaging while listening to an audiobook.
In both English and Chinese, we find evidence for multipath parsing. Brain regions associated with this multipath effect include bilateral superior temporal gyrus.
arXiv Detail & Related papers (2024-01-31T18:07:12Z) - DEMETR: Diagnosing Evaluation Metrics for Translation [21.25704103403547]
We release DEMETR, a diagnostic dataset with 31K English examples.
We find that learned metrics perform substantially better than string-based metrics on DEMETR.
arXiv Detail & Related papers (2022-10-25T03:25:44Z) - Do Not Fire the Linguist: Grammatical Profiles Help Language Models
Detect Semantic Change [6.7485485663645495]
We first compare the performance of grammatical profiles against that of a multilingual neural language model (XLM-R) on 10 datasets, covering 7 languages.
Our results show that ensembling grammatical profiles with XLM-R improves semantic change detection performance for most datasets and languages.
arXiv Detail & Related papers (2022-04-12T11:20:42Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Linguistic dependencies and statistical dependence [76.89273585568084]
We use pretrained language models to estimate probabilities of words in context.
We find that maximum-CPMI trees correspond to linguistic dependencies more often than trees extracted from non-contextual PMI estimate.
arXiv Detail & Related papers (2021-04-18T02:43:37Z) - GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and
Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations.
GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree.
We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - A Tale of a Probe and a Parser [74.14046092181947]
Measuring what linguistic information is encoded in neural models of language has become popular in NLP.
Researchers approach this enterprise by training "probes" - supervised models designed to extract linguistic structure from another model's output.
One such probe is the structural probe, designed to quantify the extent to which syntactic information is encoded in contextualised word representations.
arXiv Detail & Related papers (2020-05-04T16:57:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.