Minimal Effective Theory for Phonotactic Memory: Capturing Local
Correlations due to Errors in Speech
- URL: http://arxiv.org/abs/2309.02466v1
- Date: Mon, 4 Sep 2023 22:11:26 GMT
- Title: Minimal Effective Theory for Phonotactic Memory: Capturing Local
Correlations due to Errors in Speech
- Authors: Paul Myles Eugenio
- Abstract summary: Local phonetic correlations in spoken words facilitate the learning of spoken words by reducing their information content.
We do this by constructing a locally-connected tensor-network model, inspired by similar variational models used for many-body physics.
The model is therefore a minimal model of phonetic memory, where "learning to pronounce" and "learning a word" are one and the same.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spoken language evolves constrained by the economy of speech, which depends
on factors such as the structure of the human mouth. This gives rise to local
phonetic correlations in spoken words. Here we demonstrate that these local
correlations facilitate the learning of spoken words by reducing their
information content. We do this by constructing a locally-connected
tensor-network model, inspired by similar variational models used for many-body
physics, which exploits these local phonetic correlations to facilitate the
learning of spoken words. The model is therefore a minimal model of phonetic
memory, where "learning to pronounce" and "learning a word" are one and the
same. A consequence of which is the learned ability to produce new words which
are phonetically reasonable for the target language; as well as providing a
hierarchy of the most likely errors that could be produced during the action of
speech. We test our model against Latin and Turkish words. (The code is
available on GitHub.)
Related papers
- Sylber: Syllabic Embedding Representation of Speech from Raw Audio [25.703703711031178]
We propose a new model, Sylber, that produces speech representations with clean and robust syllabic structure.
Specifically, we propose a self-supervised model that regresses features on syllabic segments distilled from a teacher model which is an exponential moving average of the model in training.
This results in a highly structured representation of speech features, offering three key benefits: 1) a fast, linear-time syllable segmentation algorithm, 2) efficient syllabic tokenization with an average of 4.27 tokens per second, and 3) syllabic units better suited for lexical and syntactic understanding.
arXiv Detail & Related papers (2024-10-09T17:59:04Z) - SpeechAlign: Aligning Speech Generation to Human Preferences [51.684183257809075]
We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences.
We show that SpeechAlign can bridge the distribution gap and facilitate continuous self-improvement of the speech language model.
arXiv Detail & Related papers (2024-04-08T15:21:17Z) - Speech language models lack important brain-relevant semantics [6.626540321463248]
Recent work has shown that text-based language models predict both text-evoked and speech-evoked brain activity to an impressive degree.
This poses the question of what types of information language models truly predict in the brain.
arXiv Detail & Related papers (2023-11-08T13:11:48Z) - Neural approaches to spoken content embedding [1.3706331473063877]
We contribute new discriminative acoustic word embedding (AWE) and acoustically grounded word embedding (AGWE) approaches based on recurrent neural networks (RNNs)
We apply our embedding models, both monolingual and multilingual, to the downstream tasks of query-by-example speech search and automatic speech recognition.
arXiv Detail & Related papers (2023-08-28T21:16:08Z) - From `Snippet-lects' to Doculects and Dialects: Leveraging Neural
Representations of Speech for Placing Audio Signals in a Language Landscape [3.96673286245683]
XLSR-53 a multilingual model of speech, builds a vector representation from audio.
We use max-pooling to aggregate the neural representations from a "snippet-lect" to a "doculect"
Similarity measurements between the 11 corpora bring out greatest closeness between those that are known to be dialects of the same language.
arXiv Detail & Related papers (2023-05-29T20:37:06Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Towards Zero-shot Learning for Automatic Phonemic Transcription [82.9910512414173]
A more challenging problem is to build phonemic transcribers for languages with zero training data.
Our model is able to recognize unseen phonemes in the target language without any training data.
It achieves 7.7% better phoneme error rate on average over a standard multilingual model.
arXiv Detail & Related papers (2020-02-26T20:38:42Z) - Lexical Sememe Prediction using Dictionary Definitions by Capturing
Local Semantic Correspondence [94.79912471702782]
Sememes, defined as the minimum semantic units of human languages, have been proven useful in many NLP tasks.
We propose a Sememe Correspondence Pooling (SCorP) model, which is able to capture this kind of matching to predict sememes.
We evaluate our model and baseline methods on a famous sememe KB HowNet and find that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-01-16T17:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.