Related papers: Train & Constrain: Phonologically Informed Tongue-Twister Generation from Topics and Paraphrases

Train & Constrain: Phonologically Informed Tongue-Twister Generation from Topics and Paraphrases

URL: http://arxiv.org/abs/2403.13901v1
Date: Wed, 20 Mar 2024 18:13:17 GMT
Title: Train & Constrain: Phonologically Informed Tongue-Twister Generation from Topics and Paraphrases
Authors: Tyler Loakman, Chen Tang, Chenghua Lin,
Abstract summary: We present a pipeline for generating phonologically informed tongue-twisters from Large Language Models (LLMs) We also present the results of automatic and human evaluation of smaller models trained on our generated dataset. We introduce a Phoneme-Aware Constrained Decoding module (PACD) that can be integrated into any causal language model.
Score: 24.954896926774627
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Previous work in phonologically and phonetically grounded language generation has mainly focused on domains such as puns and poetry. In this article, we present new work on the generation of tongue-twisters - a form of language that is required to be conditioned on a phoneme level to maximize sound overlap, whilst maintaining semantic consistency with an input topic and still being grammatically correct. We present TwisterLister, a pipeline for generating phonologically informed tongue-twisters from Large Language Models (LLMs) that we use to generate TwistList 2.0, the largest annotated dataset of tongue-twisters to date, consisting of 17K+ examples from a combination of human and LLM authors. Our generation pipeline involves the use of a phonologically constrained vocabulary alongside LLM prompting to generate novel, non-derivative tongue-twister examples. We additionally present the results of automatic and human evaluation of smaller models trained on our generated dataset to demonstrate the extent to which phonologically motivated language types can be generated without explicit injection of phonological knowledge. Additionally, we introduce a Phoneme-Aware Constrained Decoding module (PACD) that can be integrated into any causal language model and demonstrate that this method generates good quality tongue-twisters both with and without fine-tuning the underlying language model. We also design and implement a range of automatic metrics for the task of tongue-twister generation that is phonologically motivated and captures the unique essence of tongue-twisters based on Phonemic Edit Distance (PED).

Related papers

Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation [18.89091877062589]
LanStyleTTS is a non-autoregressive, language-aware style adaptive TTS framework. It supports a unified multilingual TTS model capable of producing accurate and high-quality speech without the need to train language-specific models.
arXiv Detail & Related papers (2025-04-11T06:12:57Z)
Developmental Predictive Coding Model for Early Infancy Mono and Bilingual Vocal Continual Learning [69.8008228833895]
We propose a small-sized generative neural network equipped with a continual learning mechanism. Our model prioritizes interpretability and demonstrates the advantages of online learning.
arXiv Detail & Related papers (2024-12-23T10:23:47Z)
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer [39.31849739010572]
We introduce textbfGenerative textbfPre-trained textbfSpeech textbfTransformer (GPST) GPST is a hierarchical transformer designed for efficient speech language modeling.
arXiv Detail & Related papers (2024-06-03T04:16:30Z)
Language Generation from Brain Recordings [68.97414452707103]
We propose a generative language BCI that utilizes the capacity of a large language model and a semantic brain decoder. The proposed model can generate coherent language sequences aligned with the semantic content of visual or auditory language stimuli. Our findings demonstrate the potential and feasibility of employing BCIs in direct language generation.
arXiv Detail & Related papers (2023-11-16T13:37:21Z)
Generative Spoken Language Model based on continuous word-sized audio tokens [52.081868603603844]
We introduce a Generative Spoken Language Model based on word-size continuous-valued audio embeddings. The resulting model is the first generative language model based on word-size continuous embeddings.
arXiv Detail & Related papers (2023-10-08T16:46:14Z)
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias [71.94109664001952]
Mega-TTS is a novel zero-shot TTS system that is trained with large-scale wild data. We show that Mega-TTS surpasses state-of-the-art TTS systems on zero-shot TTS speech editing, and cross-lingual TTS tasks.
arXiv Detail & Related papers (2023-06-06T08:54:49Z)
TwistList: Resources and Baselines for Tongue Twister Generation [17.317550526263183]
We present work on the generation of tongue twisters, a form of language that is required to be phonetically conditioned to maximise sound overlap. We present textbfTwistList, a large annotated dataset of tongue twisters, consisting of 2.1K+ human-authored examples. We additionally present several benchmark systems for the proposed task of tongue twister generation, including models that both do and do not require training on in-domain data.
arXiv Detail & Related papers (2023-06-06T07:20:51Z)
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling [92.55131711064935]
We propose a cross-lingual neural language model, VALL-E X, for cross-lingual speech synthesis. VALL-E X inherits strong in-context learning capabilities and can be applied for zero-shot cross-lingual text-to-speech synthesis and zero-shot speech-to-speech translation tasks. It can generate high-quality speech in the target language via just one speech utterance in the source language as a prompt while preserving the unseen speaker's voice, emotion, and acoustic environment.
arXiv Detail & Related papers (2023-03-07T14:31:55Z)
PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically [20.159562278326764]
We propose PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters automatically. We leverage phoneme representations to capture the notion of phonetic difficulty. We show that PANCETTA generates novel, phonetically difficult, fluent, and semantically meaningful tongue twisters.
arXiv Detail & Related papers (2022-09-13T19:46:15Z)
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes. With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech. We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
Deep Sound Change: Deep and Iterative Learning, Convolutional Neural Networks, and Language Change [0.0]
This paper proposes a framework for modeling sound change that combines deep learning and iterative learning. It argues that several properties of sound change emerge from the proposed architecture.
arXiv Detail & Related papers (2020-11-10T23:49:09Z)
Phonological Features for 0-shot Multilingual Speech Synthesis [50.591267188664666]
We show that code-switching is possible for languages unseen during training, even within monolingual models. We generate intelligible, code-switched speech in a new language at test time, including the approximation of sounds never seen in training.
arXiv Detail & Related papers (2020-08-06T18:25:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.