PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters
Automatically
- URL: http://arxiv.org/abs/2209.06275v1
- Date: Tue, 13 Sep 2022 19:46:15 GMT
- Title: PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters
Automatically
- Authors: Sedrick Scott Keh, Steven Y. Feng, Varun Gangal, Malihe Alikhani,
Eduard Hovy
- Abstract summary: We propose PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters automatically.
We leverage phoneme representations to capture the notion of phonetic difficulty.
We show that PANCETTA generates novel, phonetically difficult, fluent, and semantically meaningful tongue twisters.
- Score: 20.159562278326764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tongue twisters are meaningful sentences that are difficult to pronounce. The
process of automatically generating tongue twisters is challenging since the
generated utterance must satisfy two conditions at once: phonetic difficulty
and semantic meaning. Furthermore, phonetic difficulty is itself hard to
characterize and is expressed in natural tongue twisters through a
heterogeneous mix of phenomena such as alliteration and homophony. In this
paper, we propose PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue
Twisters Automatically. We leverage phoneme representations to capture the
notion of phonetic difficulty, and we train language models to generate
original tongue twisters on two proposed task settings. To do this, we curate a
dataset called PANCETTA, consisting of existing English tongue twisters.
Through automatic and human evaluation, as well as qualitative analysis, we
show that PANCETTA generates novel, phonetically difficult, fluent, and
semantically meaningful tongue twisters.
Related papers
- TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation [97.54885207518946]
We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion.
We propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process.
Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
arXiv Detail & Related papers (2024-05-28T04:11:37Z) - Train & Constrain: Phonologically Informed Tongue-Twister Generation from Topics and Paraphrases [24.954896926774627]
We present a pipeline for generating phonologically informed tongue twisters from large language models (LLMs)
We show the results of automatic and human evaluation of smaller models trained on our generated dataset.
We introduce a phoneme-aware constrained decoding module (PACD) that can be integrated into an autoregressive language model.
arXiv Detail & Related papers (2024-03-20T18:13:17Z) - Thread of Thought Unraveling Chaotic Contexts [133.24935874034782]
"Thread of Thought" (ThoT) strategy draws inspiration from human cognitive processes.
In experiments, ThoT significantly improves reasoning performance compared to other prompting techniques.
arXiv Detail & Related papers (2023-11-15T06:54:44Z) - The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language [7.0944623704102625]
We show that phoneme-based models for speech processing can achieve strong crosslinguistic generalizability to unseen languages.
We propose CLAP-IPA, a multi-lingual phoneme-speech contrastive embedding model capable of open-vocabulary matching between arbitrary speech signals and phonemic sequences.
arXiv Detail & Related papers (2023-11-14T17:09:07Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - TwistList: Resources and Baselines for Tongue Twister Generation [17.317550526263183]
We present work on the generation of tongue twisters, a form of language that is required to be phonetically conditioned to maximise sound overlap.
We present textbfTwistList, a large annotated dataset of tongue twisters, consisting of 2.1K+ human-authored examples.
We additionally present several benchmark systems for the proposed task of tongue twister generation, including models that both do and do not require training on in-domain data.
arXiv Detail & Related papers (2023-06-06T07:20:51Z) - Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for
Text-to-Speech [88.22544315633687]
Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech systems.
We propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary.
Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy.
arXiv Detail & Related papers (2022-06-05T10:50:34Z) - AUTOLEX: An Automatic Framework for Linguistic Exploration [93.89709486642666]
We propose an automatic framework that aims to ease linguists' discovery and extraction of concise descriptions of linguistic phenomena.
Specifically, we apply this framework to extract descriptions for three phenomena: morphological agreement, case marking, and word order.
We evaluate the descriptions with the help of language experts and propose a method for automated evaluation when human evaluation is infeasible.
arXiv Detail & Related papers (2022-03-25T20:37:30Z) - English-to-Chinese Transliteration with Phonetic Back-transliteration [0.9281671380673306]
Transliteration is a task of translating named entities from a language to another, based on phonetic similarity.
In this work, we incorporate phonetic information into neural networks in two ways: we synthesize extra data using forward and back-translation but in a phonetic manner.
Our experiments include three language pairs and six directions, namely English to and from Chinese, Hebrew and Thai.
arXiv Detail & Related papers (2021-12-20T03:29:28Z) - AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style [111.89762723159677]
We develop AdaSpeech 3, an adaptive TTS system that fine-tunes a well-trained reading-style TTS model for spontaneous-style speech.
AdaSpeech 3 synthesizes speech with natural FP and rhythms in spontaneous styles, and achieves much better MOS and SMOS scores than previous adaptive TTS systems.
arXiv Detail & Related papers (2021-07-06T10:40:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.