Related papers: Phylogenetic signal in phonotactics

Phylogenetic signal in phonotactics

URL: http://arxiv.org/abs/2002.00527v2
Date: Tue, 14 Jul 2020 04:50:02 GMT
Title: Phylogenetic signal in phonotactics
Authors: Jayden L. Macklin-Cordes, Claire Bowern and Erich R. Round
Abstract summary: We show how a phylogenetic approach opens the possibility of gaining historical insights from entirely new kinds of linguistic data. We extract phonotactic data from 111 Pama-Nyungan and apply tests for phylogenetic signal. Results demonstrate the viability of employing a new source of readily extractable data in historical and comparative linguistics.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Phylogenetic methods have broad potential in linguistics beyond tree inference. Here, we show how a phylogenetic approach opens the possibility of gaining historical insights from entirely new kinds of linguistic data--in this instance, statistical phonotactics. We extract phonotactic data from 111 Pama-Nyungan vocabularies and apply tests for phylogenetic signal, quantifying the degree to which the data reflect phylogenetic history. We test three datasets: (1) binary variables recording the presence or absence of biphones (two-segment sequences) in a lexicon (2) frequencies of transitions between segments, and (3) frequencies of transitions between natural sound classes. Australian languages have been characterized as having a high degree of phonotactic homogeneity. Nevertheless, we detect phylogenetic signal in all datasets. Phylogenetic signal is greater in finer-grained frequency data than in binary data, and greatest in natural-class-based data. These results demonstrate the viability of employing a new source of readily extractable data in historical and comparative linguistics.

Related papers

Beyond cognacy [0.21756081703275998]
Two fully automated methods are compared to extract phylogenetic signal directly from lexical data.<n>Results show that MSA-based inference yields trees more consistent with linguistic classifications, better predicts typological variation, and provides a clearer phylogenetic signal.
arXiv Detail & Related papers (2025-07-02T06:47:34Z)
The Cognate Data Bottleneck in Language Phylogenetics [49.1574468325115]
Phylogenetic data analysis approaches that require larger datasets can not be applied to cognate data.<n>It remains an open question how, and if these computational approaches can be applied in historical linguistics.
arXiv Detail & Related papers (2025-07-01T16:14:20Z)
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data [69.7174072745851]
We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data. To overcome the first challenge, we align the generations of the T2A model with the small-scale dataset using preference optimization. To address the second challenge, we propose a novel caption generation technique that leverages the reasoning capabilities of Large Language Models.
arXiv Detail & Related papers (2024-10-02T22:05:36Z)
Are Sounds Sound for Phylogenetic Reconstruction? [41.85920785319125]
We test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average.
arXiv Detail & Related papers (2024-02-05T08:35:33Z)
PhyloGFN: Phylogenetic inference with generative flow networks [57.104166650526416]
We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference. Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies. We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
arXiv Detail & Related papers (2023-10-12T23:46:08Z)
Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification. We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z)
Transferable Models for Bioacoustics with Human Language Supervision [0.0]
BioLingual is a new model for bioacoustics based on contrastive language-audio pretraining. It can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries.
arXiv Detail & Related papers (2023-08-09T14:22:18Z)
Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores. Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z)
Self-Supervised Graph Representation Learning for Neuronal Morphologies [75.38832711445421]
We present GraphDINO, a data-driven approach to learn low-dimensional representations of 3D neuronal morphologies from unlabeled datasets. We show, in two different species and across multiple brain areas, that this method yields morphological cell type clusterings on par with manual feature-based classification by experts. Our method could potentially enable data-driven discovery of novel morphological features and cell types in large-scale datasets.
arXiv Detail & Related papers (2021-12-23T12:17:47Z)
Evolution and trade-off dynamics of functional load [0.0]
We apply phylogenetic methods to examine the diachronic evolution of FL across 90 languages of the Pama-Nyungan (PN) family of Australia. We find a high degree of phylogenetic signal in FL. Though phylogenetic signal has been reported for phonological structures, such as phonotactics, its detection in measures of phonological function is novel.
arXiv Detail & Related papers (2021-12-22T20:57:50Z)
Phylogenetic typology [0.913755431537592]
We propose a novel method to estimate the frequency distribution of linguistic variables. Unlike previous approaches, our technique uses all available data. As a case study, we investigate a series of potential word-order correlations across the languages of the world.
arXiv Detail & Related papers (2021-03-18T12:03:49Z)
Discriminative Singular Spectrum Classifier with Applications on Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently. Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces. The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z)
PhyAAt: Physiology of Auditory Attention to Speech Dataset [0.5976833843615385]
Auditory attention to natural speech is a complex brain process. We present a dataset of physiological signals collected from an experiment on auditory attention to natural speech.
arXiv Detail & Related papers (2020-05-23T17:55:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.