Are Sounds Sound for Phylogenetic Reconstruction?
- URL: http://arxiv.org/abs/2402.02807v3
- Date: Tue, 14 May 2024 07:38:26 GMT
- Title: Are Sounds Sound for Phylogenetic Reconstruction?
- Authors: Luise Häuser, Gerhard Jäger, Taraka Rama, Johann-Mattis List, Alexandros Stamatakis,
- Abstract summary: We test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction.
Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average.
- Score: 41.85920785319125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistics, although there do exist a few studies in which authors praise the benefits of comparing words at the level of sound sequences. Building on (a) ten diverse datasets from different language families, and (b) state-of-the-art methods for automated cognate and sound correspondence detection, we test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average, to the gold standard phylogenies than phylogenies reconstructed from sound correspondences.
Related papers
- Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - Patterns of Persistence and Diffusibility across the World's Languages [3.7055269158186874]
Colexification is a type of similarity where a single lexical form is used to convey multiple meanings.
We shed light on the linguistic causes of cross-lingual similarity in colexification and phonology.
We construct large-scale graphs incorporating semantic, genealogical, phonological and geographical data for 1,966 languages.
arXiv Detail & Related papers (2024-01-03T12:05:38Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Evolution and trade-off dynamics of functional load [0.0]
We apply phylogenetic methods to examine the diachronic evolution of FL across 90 languages of the Pama-Nyungan (PN) family of Australia.
We find a high degree of phylogenetic signal in FL. Though phylogenetic signal has been reported for phonological structures, such as phonotactics, its detection in measures of phonological function is novel.
arXiv Detail & Related papers (2021-12-22T20:57:50Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Do Acoustic Word Embeddings Capture Phonological Similarity? An
Empirical Study [12.210797811981173]
In this paper, we ask: does the distance in the acoustic embedding space correlate with phonological dissimilarity?
We train AWE models in controlled settings for two languages (German and Czech) and evaluate the embeddings on two tasks: word discrimination and phonological similarity.
Our experiments show that (1) the distance in the embedding space in the best cases only moderately correlates with phonological distance, and (2) improving the performance on the word discrimination task does not necessarily yield models that better reflect word phonological similarity.
arXiv Detail & Related papers (2021-06-16T10:47:56Z) - Decomposing lexical and compositional syntax and semantics with deep
language models [82.81964713263483]
The activations of language transformers like GPT2 have been shown to linearly map onto brain activity during speech comprehension.
Here, we propose a taxonomy to factorize the high-dimensional activations of language models into four classes: lexical, compositional, syntactic, and semantic representations.
The results highlight two findings. First, compositional representations recruit a more widespread cortical network than lexical ones, and encompass the bilateral temporal, parietal and prefrontal cortices.
arXiv Detail & Related papers (2021-03-02T10:24:05Z) - Evaluating Models of Robust Word Recognition with Serial Reproduction [8.17947290421835]
We compare several broad-coverage probabilistic generative language models in their ability to capture human linguistic expectations.
We find that those models that make use of abstract representations of preceding linguistic context best predict the changes made by people in the course of serial reproduction.
arXiv Detail & Related papers (2021-01-24T20:16:12Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - Phylogenetic signal in phonotactics [0.0]
We show how a phylogenetic approach opens the possibility of gaining historical insights from entirely new kinds of linguistic data.
We extract phonotactic data from 111 Pama-Nyungan and apply tests for phylogenetic signal.
Results demonstrate the viability of employing a new source of readily extractable data in historical and comparative linguistics.
arXiv Detail & Related papers (2020-02-03T01:23:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.