Generative Adversarial Phonology: Modeling unsupervised phonetic and
phonological learning with neural networks
- URL: http://arxiv.org/abs/2006.03965v1
- Date: Sat, 6 Jun 2020 20:31:23 GMT
- Title: Generative Adversarial Phonology: Modeling unsupervised phonetic and
phonological learning with neural networks
- Authors: Ga\v{s}per Begu\v{s}
- Abstract summary: Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations.
This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture.
We propose a methodology to uncover the network's internal representations that correspond to phonetic and phonological properties.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training deep neural networks on well-understood dependencies in speech data
can provide new insights into how they learn internal representations. This
paper argues that acquisition of speech can be modeled as a dependency between
random space and generated speech data in the Generative Adversarial Network
architecture and proposes a methodology to uncover the network's internal
representations that correspond to phonetic and phonological properties. The
Generative Adversarial architecture is uniquely appropriate for modeling
phonetic and phonological learning because the network is trained on
unannotated raw acoustic data and learning is unsupervised without any
language-specific assumptions or pre-assumed levels of abstraction. A
Generative Adversarial Network was trained on an allophonic distribution in
English. The network successfully learns the allophonic alternation: the
network's generated speech signal contains the conditional distribution of
aspiration duration. The paper proposes a technique for establishing the
network's internal representations that identifies latent variables that
correspond to, for example, presence of [s] and its spectral properties. By
manipulating these variables, we actively control the presence of [s] and its
frication amplitude in the generated outputs. This suggests that the network
learns to use latent variables as an approximation of phonetic and phonological
representations. Crucially, we observe that the dependencies learned in
training extend beyond the training interval, which allows for additional
exploration of learning representations. The paper also discusses how the
network's architecture and innovative outputs resemble and differ from
linguistic behavior in language acquisition, speech disorders, and speech
errors, and how well-understood dependencies in speech data can help us
interpret how neural networks learn their representations.
Related papers
- Explaining Spectrograms in Machine Learning: A Study on Neural Networks for Speech Classification [2.4472308031704073]
This study investigates discriminative patterns learned by neural networks for accurate speech classification.
By examining the activations and features of neural networks for vowel classification, we gain insights into what the networks "see" in spectrograms.
arXiv Detail & Related papers (2024-07-10T07:37:18Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Color Overmodification Emerges from Data-Driven Learning and Pragmatic
Reasoning [53.088796874029974]
We show that speakers' referential expressions depart from communicative ideals in ways that help illuminate the nature of pragmatic language use.
By adopting neural networks as learning agents, we show that overmodification is more likely with environmental features that are infrequent or salient.
arXiv Detail & Related papers (2022-05-18T18:42:43Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Neuro-Symbolic Representations for Video Captioning: A Case for
Leveraging Inductive Biases for Vision and Language [148.0843278195794]
We propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning.
Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions.
arXiv Detail & Related papers (2020-11-18T20:21:19Z) - Deep Sound Change: Deep and Iterative Learning, Convolutional Neural
Networks, and Language Change [0.0]
This paper proposes a framework for modeling sound change that combines deep learning and iterative learning.
It argues that several properties of sound change emerge from the proposed architecture.
arXiv Detail & Related papers (2020-11-10T23:49:09Z) - Local and non-local dependency learning and emergence of rule-like
representations in speech data by Deep Convolutional Generative Adversarial
Networks [0.0]
This paper argues that training GANs on local and non-local dependencies in speech data offers insights into how deep neural networks discretize continuous data.
arXiv Detail & Related papers (2020-09-27T00:02:34Z) - CiwGAN and fiwGAN: Encoding information in acoustic data to model
lexical learning with Generative Adversarial Networks [0.0]
Lexical learning is modeled as emergent from an architecture that forces a deep neural network to output data.
Networks trained on lexical items from TIMIT learn to encode unique information corresponding to lexical items in the form of categorical variables in their latent space.
We show that phonetic and phonological representations learned by the network can be productively recombined and directly paralleled to productivity in human speech.
arXiv Detail & Related papers (2020-06-04T15:33:55Z) - Untangling in Invariant Speech Recognition [17.996356271398295]
We study how information is untangled within neural networks trained to recognize speech.
We observe speaker-specific nuisance variations are discarded by the network's hierarchy, whereas task-relevant properties are untangled in later layers.
We find that the deep representations carry out significant temporal untangling by efficiently extracting task-relevant features at each time step of the computation.
arXiv Detail & Related papers (2020-03-03T20:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.