Automating Sound Change Prediction for Phylogenetic Inference: A
Tukanoan Case Study
- URL: http://arxiv.org/abs/2402.01582v1
- Date: Fri, 2 Feb 2024 17:20:16 GMT
- Title: Automating Sound Change Prediction for Phylogenetic Inference: A
Tukanoan Case Study
- Authors: Kalvin Chang, Nathaniel R. Robinson, Anna Cai, Ting Chen, Annie Zhang,
David R. Mortensen
- Abstract summary: We train a neural network on sound change data to predict intermediate sound change steps between historical protoforms and their modern descendants.
In our best experiments on Tukanoan languages, this method produces trees with a Generalized Quartet Distance of 0.12 from a tree that used expert annotations.
- Score: 12.78027959820939
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We describe a set of new methods to partially automate linguistic
phylogenetic inference given (1) cognate sets with their respective protoforms
and sound laws, (2) a mapping from phones to their articulatory features and
(3) a typological database of sound changes. We train a neural network on these
sound change data to weight articulatory distances between phones and predict
intermediate sound change steps between historical protoforms and their modern
descendants, replacing a linguistic expert in part of a parsimony-based
phylogenetic inference algorithm. In our best experiments on Tukanoan
languages, this method produces trees with a Generalized Quartet Distance of
0.12 from a tree that used expert annotations, a significant improvement over
other semi-automated baselines. We discuss potential benefits and drawbacks to
our neural approach and parsimony-based tree prediction. We also experiment
with a minimal generalization learner for automatic sound law induction,
finding it comparably effective to sound laws from expert annotation. Our code
is publicly available at https://github.com/cmu-llab/aiscp.
Related papers
- We Augmented Whisper With kNN and You Won't Believe What Came Next [10.174848090916669]
We show that Whisper, a transformer end-to-end speech model, benefits from $k$NN.
We discuss implications for speaker adaptation, and analyze improvements by gender, accent, and age.
arXiv Detail & Related papers (2024-10-24T15:32:52Z) - Human-like Linguistic Biases in Neural Speech Models: Phonetic Categorization and Phonotactic Constraints in Wav2Vec2.0 [0.11510009152620666]
We study how Wav2Vec2 resolves phonotactic constraints.
We synthesize sounds on an acoustic continuum between /l/ and /r/ and embed them in controlled contexts.
Like humans, Wav2Vec2 models show a bias towards the phonotactically admissable category in processing such ambiguous sounds.
arXiv Detail & Related papers (2024-07-03T11:04:31Z) - Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - A unified one-shot prosody and speaker conversion system with
self-supervised discrete speech units [94.64927912924087]
Existing systems ignore the correlation between prosody and language content, leading to degradation of naturalness in converted speech.
We devise a cascaded modular system leveraging self-supervised discrete speech units as language representation.
Experiments show that our system outperforms previous approaches in naturalness, intelligibility, speaker transferability, and prosody transferability.
arXiv Detail & Related papers (2022-11-12T00:54:09Z) - Learning Phone Recognition from Unpaired Audio and Phone Sequences Based
on Generative Adversarial Network [58.82343017711883]
This paper investigates how to learn directly from unpaired phone sequences and speech utterances.
GAN training is adopted in the first stage to find the mapping relationship between unpaired speech and phone sequence.
In the second stage, another HMM model is introduced to train from the generator's output, which boosts the performance.
arXiv Detail & Related papers (2022-07-29T09:29:28Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis [68.76620947298595]
Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text.
We propose a model that generates speech explicitly conditioned on the three primary acoustic correlates of prosody.
arXiv Detail & Related papers (2021-06-15T18:03:48Z) - Deep Sound Change: Deep and Iterative Learning, Convolutional Neural
Networks, and Language Change [0.0]
This paper proposes a framework for modeling sound change that combines deep learning and iterative learning.
It argues that several properties of sound change emerge from the proposed architecture.
arXiv Detail & Related papers (2020-11-10T23:49:09Z) - Syllabification of the Divine Comedy [0.0]
We provide a syllabification algorithm for the Divine Comedy using techniques from probabilistic and constraint programming.
We particularly focus on the synalephe, addressed in terms of the "propensity" of a word to take part in a synalephe with adjacent words.
We jointly provide an online vocabulary containing, for each word, information about its syllabification, the location of the tonic accent, and the aforementioned synalephe propensity.
arXiv Detail & Related papers (2020-10-26T12:14:14Z) - One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech [3.42658286826597]
We introduce an approach to multilingual speech synthesis which uses the meta-learning concept of contextual parameter generation.
Our model is shown to effectively share information across languages and according to a subjective evaluation test, it produces more natural and accurate code-switching speech than the baselines.
arXiv Detail & Related papers (2020-08-03T10:43:30Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.