Phylogenetic signal in phonotactics
- URL: http://arxiv.org/abs/2002.00527v2
- Date: Tue, 14 Jul 2020 04:50:02 GMT
- Title: Phylogenetic signal in phonotactics
- Authors: Jayden L. Macklin-Cordes, Claire Bowern and Erich R. Round
- Abstract summary: We show how a phylogenetic approach opens the possibility of gaining historical insights from entirely new kinds of linguistic data.
We extract phonotactic data from 111 Pama-Nyungan and apply tests for phylogenetic signal.
Results demonstrate the viability of employing a new source of readily extractable data in historical and comparative linguistics.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Phylogenetic methods have broad potential in linguistics beyond tree
inference. Here, we show how a phylogenetic approach opens the possibility of
gaining historical insights from entirely new kinds of linguistic data--in this
instance, statistical phonotactics. We extract phonotactic data from 111
Pama-Nyungan vocabularies and apply tests for phylogenetic signal, quantifying
the degree to which the data reflect phylogenetic history. We test three
datasets: (1) binary variables recording the presence or absence of biphones
(two-segment sequences) in a lexicon (2) frequencies of transitions between
segments, and (3) frequencies of transitions between natural sound classes.
Australian languages have been characterized as having a high degree of
phonotactic homogeneity. Nevertheless, we detect phylogenetic signal in all
datasets. Phylogenetic signal is greater in finer-grained frequency data than
in binary data, and greatest in natural-class-based data. These results
demonstrate the viability of employing a new source of readily extractable data
in historical and comparative linguistics.
Related papers
- Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data [69.7174072745851]
We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data.
To overcome the first challenge, we align the generations of the T2A model with the small-scale dataset using preference optimization.
To address the second challenge, we propose a novel caption generation technique that leverages the reasoning capabilities of Large Language Models.
arXiv Detail & Related papers (2024-10-02T22:05:36Z) - Are Sounds Sound for Phylogenetic Reconstruction? [41.85920785319125]
We test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction.
Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average.
arXiv Detail & Related papers (2024-02-05T08:35:33Z) - PhyloGFN: Phylogenetic inference with generative flow networks [57.104166650526416]
We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference.
Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies.
We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
arXiv Detail & Related papers (2023-10-12T23:46:08Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Transferable Models for Bioacoustics with Human Language Supervision [0.0]
BioLingual is a new model for bioacoustics based on contrastive language-audio pretraining.
It can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries.
arXiv Detail & Related papers (2023-08-09T14:22:18Z) - Fast and Functional Structured Data Generators Rooted in
Out-of-Equilibrium Physics [62.997667081978825]
We address the challenge of using energy-based models to produce high-quality, label-specific data in structured datasets.
Traditional training methods encounter difficulties due to inefficient Markov chain Monte Carlo mixing.
We use a novel training algorithm that exploits non-equilibrium effects.
arXiv Detail & Related papers (2023-07-13T15:08:44Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Evolution and trade-off dynamics of functional load [0.0]
We apply phylogenetic methods to examine the diachronic evolution of FL across 90 languages of the Pama-Nyungan (PN) family of Australia.
We find a high degree of phylogenetic signal in FL. Though phylogenetic signal has been reported for phonological structures, such as phonotactics, its detection in measures of phonological function is novel.
arXiv Detail & Related papers (2021-12-22T20:57:50Z) - Phylogenetic typology [0.913755431537592]
We propose a novel method to estimate the frequency distribution of linguistic variables.
Unlike previous approaches, our technique uses all available data.
As a case study, we investigate a series of potential word-order correlations across the languages of the world.
arXiv Detail & Related papers (2021-03-18T12:03:49Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - PhyAAt: Physiology of Auditory Attention to Speech Dataset [0.5976833843615385]
Auditory attention to natural speech is a complex brain process.
We present a dataset of physiological signals collected from an experiment on auditory attention to natural speech.
arXiv Detail & Related papers (2020-05-23T17:55:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.