Letters From the Past: Modeling Historical Sound Change Through
Diachronic Character Embeddings
- URL: http://arxiv.org/abs/2205.08256v1
- Date: Tue, 17 May 2022 11:57:17 GMT
- Title: Letters From the Past: Modeling Historical Sound Change Through
Diachronic Character Embeddings
- Authors: Sidsel Boldsen and Patrizia Paggio
- Abstract summary: We address the detection of sound change through historical spelling.
We propose that a sound change can be captured by comparing the relative distance through time between their distributions using PPMI character embeddings.
We show that the models are able to identify several of the changes under consideration and to uncover meaningful contexts in which they appeared.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While a great deal of work has been done on NLP approaches to lexical
semantic change detection, other aspects of language change have received less
attention from the NLP community. In this paper, we address the detection of
sound change through historical spelling. We propose that a sound change can be
captured by comparing the relative distance through time between their
distributions using PPMI character embeddings. We verify this hypothesis in
synthetic data and then test the method's ability to trace the well-known
historical change of lenition of plosives in Danish historical sources. We show
that the models are able to identify several of the changes under consideration
and to uncover meaningful contexts in which they appeared. The methodology has
the potential to contribute to the study of open questions such as the relative
chronology of sound shifts and their geographical distribution.
Related papers
- Exploring Sound Change Over Time: A Review of Computational and Human Perception [2.8908326904081334]
We provide a pioneering review contrasting computational with human perception from the perspectives of methods and tasks.
Overall, computational approaches rely on computer-driven models to perceive historical sound changes on etymological datasets.
Human approaches use listener-driven models to perceive ongoing sound changes on recording corpora.
arXiv Detail & Related papers (2024-07-06T14:44:59Z) - Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio
Detection [54.20974251478516]
We propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting.
When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances.
Our method can easily be generalized to related fields, like speech emotion recognition.
arXiv Detail & Related papers (2023-08-07T05:05:49Z) - Reliable Detection and Quantification of Selective Forces in Language
Change [3.55026004901472]
We apply a recently-introduced method to corpus data to quantify the strength of selection in specific instances of historical language change.
We show that this method is more reliable and interpretable than similar methods that have previously been applied.
arXiv Detail & Related papers (2023-05-25T10:20:15Z) - Measuring Intersectional Biases in Historical Documents [37.03904311548859]
We investigate the continuities and transformations of bias in historical newspapers published in the Caribbean during the colonial era (18th to 19th centuries)
Our analyses are performed along the axes of gender, race, and their intersection.
We find that there is a trade-off between the stability of the word embeddings and their compatibility with the historical dataset.
arXiv Detail & Related papers (2023-05-21T07:10:31Z) - Stability of Syntactic Dialect Classification Over Space and Time [0.0]
This paper constructs a test set for 12 dialects of English that spans three years at monthly intervals with a fixed spatial distribution across 1,120 cities.
The decay rate of classification performance for each dialect over time allows us to identify regions undergoing syntactic change.
And the distribution of classification accuracy within dialect regions allows us to identify the degree to which the grammar of a dialect is internally heterogeneous.
arXiv Detail & Related papers (2022-09-11T23:14:59Z) - Contextualized language models for semantic change detection: lessons
learned [4.436724861363513]
We present a qualitative analysis of the outputs of contextualized embedding-based methods for detecting diachronic semantic change.
Our findings show that contextualized methods can often predict high change scores for words which are not undergoing any real diachronic semantic shift.
Our conclusion is that pre-trained contextualized language models are prone to confound changes in lexicographic senses and changes in contextual variance.
arXiv Detail & Related papers (2022-08-31T23:35:24Z) - Combating Temporal Drift in Crisis with Adapted Embeddings [58.4558720264897]
Language usage changes over time, and this can impact the effectiveness of NLP systems.
This work investigates methods for adapting to changing discourse during crisis events.
arXiv Detail & Related papers (2021-04-17T13:11:41Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Lexical semantic change for Ancient Greek and Latin [61.69697586178796]
Associating a word's correct meaning in its historical context is a central challenge in diachronic research.
We build on a recent computational approach to semantic change based on a dynamic Bayesian mixture model.
We provide a systematic comparison of dynamic Bayesian mixture models for semantic change with state-of-the-art embedding-based models.
arXiv Detail & Related papers (2021-01-22T12:04:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.