Reliable Detection and Quantification of Selective Forces in Language
Change
- URL: http://arxiv.org/abs/2305.15914v2
- Date: Mon, 21 Aug 2023 12:51:10 GMT
- Title: Reliable Detection and Quantification of Selective Forces in Language
Change
- Authors: Juan Guerrero Montero, Andres Karjus, Kenny Smith, Richard A. Blythe
- Abstract summary: We apply a recently-introduced method to corpus data to quantify the strength of selection in specific instances of historical language change.
We show that this method is more reliable and interpretable than similar methods that have previously been applied.
- Score: 3.55026004901472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language change is a cultural evolutionary process in which variants of
linguistic variables change in frequency through processes analogous to
mutation, selection and genetic drift. In this work, we apply a
recently-introduced method to corpus data to quantify the strength of selection
in specific instances of historical language change. We first demonstrate, in
the context of English irregular verbs, that this method is more reliable and
interpretable than similar methods that have previously been applied. We
further extend this study to demonstrate that a bias towards phonological
simplicity overrides that favouring grammatical simplicity when these are in
conflict. Finally, with reference to Spanish spelling reforms, we show that the
method can also detect points in time at which selection strengths change, a
feature that is generically expected for socially-motivated language change.
Together, these results indicate how hypotheses for mechanisms of language
change can be tested quantitatively using historical corpus data.
Related papers
- Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is
It and How Does It Affect Transfer? [50.48082721476612]
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability.
We investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages.
arXiv Detail & Related papers (2022-12-21T09:44:08Z) - Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in
Low-Resource English Varieties [3.3536302616846734]
We present a human-in-the-loop approach to generate and filter effective contrast sets via corpus-guided edits.
We show that our approach improves feature detection for both Indian English and African American English, demonstrate how it can assist linguistic research, and release our fine-tuned models for use by other researchers.
arXiv Detail & Related papers (2022-09-15T21:19:31Z) - Contextualized language models for semantic change detection: lessons
learned [4.436724861363513]
We present a qualitative analysis of the outputs of contextualized embedding-based methods for detecting diachronic semantic change.
Our findings show that contextualized methods can often predict high change scores for words which are not undergoing any real diachronic semantic shift.
Our conclusion is that pre-trained contextualized language models are prone to confound changes in lexicographic senses and changes in contextual variance.
arXiv Detail & Related papers (2022-08-31T23:35:24Z) - Do Not Fire the Linguist: Grammatical Profiles Help Language Models
Detect Semantic Change [6.7485485663645495]
We first compare the performance of grammatical profiles against that of a multilingual neural language model (XLM-R) on 10 datasets, covering 7 languages.
Our results show that ensembling grammatical profiles with XLM-R improves semantic change detection performance for most datasets and languages.
arXiv Detail & Related papers (2022-04-12T11:20:42Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - How individuals change language [1.2437226707039446]
We introduce a very general mathematical model that encompasses a wide variety of individual-level linguistic behaviours.
We compare the likelihood of empirically-attested changes in definite and indefinite articles in multiple languages under different assumptions.
We find that accounts of language change that appeal primarily to errors in childhood language acquisition are very weakly supported by the historical data.
arXiv Detail & Related papers (2021-04-20T19:02:49Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - A Probabilistic Approach in Historical Linguistics Word Order Change in
Infinitival Clauses: from Latin to Old French [0.0]
This thesis investigates word order change in infinitival clauses in the history of Latin and Old French.
I examine a synchronic word order variation in each stage of language change, from which I infer the character, periodization and constraints of diachronic variation.
I present a three-stage probabilistic model of word order change, which also conforms to traditional language change patterns.
arXiv Detail & Related papers (2020-11-16T20:30:31Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.