Deep Sound Change: Deep and Iterative Learning, Convolutional Neural
Networks, and Language Change
- URL: http://arxiv.org/abs/2011.05463v2
- Date: Wed, 22 Sep 2021 04:59:52 GMT
- Title: Deep Sound Change: Deep and Iterative Learning, Convolutional Neural
Networks, and Language Change
- Authors: Ga\v{s}per Begu\v{s}
- Abstract summary: This paper proposes a framework for modeling sound change that combines deep learning and iterative learning.
It argues that several properties of sound change emerge from the proposed architecture.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a framework for modeling sound change that combines deep
learning and iterative learning. Acquisition and transmission of speech is
modeled by training generations of Generative Adversarial Networks (GANs) on
unannotated raw speech data. The paper argues that several properties of sound
change emerge from the proposed architecture. GANs (Goodfellow et al. 2014
arXiv:1406.2661, Donahue et al. 2019 arXiv:1705.07904) are uniquely appropriate
for modeling language change because the networks are trained on raw
unsupervised acoustic data, contain no language-specific features and, as
argued in Begu\v{s} (2020 arXiv:2006.03965), encode phonetic and phonological
representations in their latent space and generate linguistically informative
innovative data. The first generation of networks is trained on the relevant
sequences in human speech from TIMIT. The subsequent generations are not
trained on TIMIT, but on generated outputs from the previous generation and
thus start learning from each other in an iterative learning task. The initial
allophonic distribution is progressively being lost with each generation,
likely due to pressures from the global distribution of aspiration in the
training data. The networks show signs of a gradual shift in phonetic targets
characteristic of a gradual phonetic sound change. At endpoints, the outputs
superficially resemble a phonological change -- rule loss.
Related papers
- Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer [39.31849739010572]
We introduce textbfGenerative textbfPre-trained textbfSpeech textbfTransformer (GPST)
GPST is a hierarchical transformer designed for efficient speech language modeling.
arXiv Detail & Related papers (2024-06-03T04:16:30Z) - SpeechAlign: Aligning Speech Generation to Human Preferences [51.684183257809075]
We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences.
We show that SpeechAlign can bridge the distribution gap and facilitate continuous self-improvement of the speech language model.
arXiv Detail & Related papers (2024-04-08T15:21:17Z) - Train & Constrain: Phonologically Informed Tongue-Twister Generation from Topics and Paraphrases [24.954896926774627]
We present a pipeline for generating phonologically informed tongue twisters from large language models (LLMs)
We show the results of automatic and human evaluation of smaller models trained on our generated dataset.
We introduce a phoneme-aware constrained decoding module (PACD) that can be integrated into an autoregressive language model.
arXiv Detail & Related papers (2024-03-20T18:13:17Z) - SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation [56.913182262166316]
Chain-of-Information Generation (CoIG) is a method for decoupling semantic and perceptual information in large-scale speech generation.
SpeechGPT-Gen is efficient in semantic and perceptual information modeling.
It markedly excels in zero-shot text-to-speech, zero-shot voice conversion, and speech-to-speech dialogue.
arXiv Detail & Related papers (2024-01-24T15:25:01Z) - Articulation GAN: Unsupervised modeling of articulatory learning [6.118463549086599]
We introduce the Articulatory Generator to the Generative Adrial Network paradigm.
A separate pre-trained physical model transforms the generated EMA representations to speech waveforms.
Articulatory analysis of the generated EMA representations suggests that the network learns to control articulators in a manner that closely follows human articulators during speech production.
arXiv Detail & Related papers (2022-10-27T05:07:04Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - General-Purpose Speech Representation Learning through a Self-Supervised
Multi-Granularity Framework [114.63823178097402]
This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning.
Specifically, we propose to use generative learning approaches to capture fine-grained information at small time scales and use discriminative learning approaches to distill coarse-grained or semantic information at large time scales.
arXiv Detail & Related papers (2021-02-03T08:13:21Z) - Generative Adversarial Phonology: Modeling unsupervised phonetic and
phonological learning with neural networks [0.0]
Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations.
This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture.
We propose a methodology to uncover the network's internal representations that correspond to phonetic and phonological properties.
arXiv Detail & Related papers (2020-06-06T20:31:23Z) - CiwGAN and fiwGAN: Encoding information in acoustic data to model
lexical learning with Generative Adversarial Networks [0.0]
Lexical learning is modeled as emergent from an architecture that forces a deep neural network to output data.
Networks trained on lexical items from TIMIT learn to encode unique information corresponding to lexical items in the form of categorical variables in their latent space.
We show that phonetic and phonological representations learned by the network can be productively recombined and directly paralleled to productivity in human speech.
arXiv Detail & Related papers (2020-06-04T15:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.