Representing `how you say' with `what you say': English corpus of
focused speech and text reflecting corresponding implications
- URL: http://arxiv.org/abs/2203.15483v1
- Date: Tue, 29 Mar 2022 12:29:22 GMT
- Title: Representing `how you say' with `what you say': English corpus of
focused speech and text reflecting corresponding implications
- Authors: Naoaki Suzuki, Satoshi Nakamura
- Abstract summary: In speech communication, how something is said (paralinguistic information) is as crucial as what is said (linguistic information)
Current speech translation systems return the same translations if the utterances are linguistically identical.
We propose mapping paralinguistic information into the linguistic domain within the source language using lexical and grammatical devices.
- Score: 10.103202030679844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In speech communication, how something is said (paralinguistic information)
is as crucial as what is said (linguistic information). As a type of
paralinguistic information, English speech uses sentence stress, the heaviest
prominence within a sentence, to convey emphasis. While different placements of
sentence stress communicate different emphatic implications, current speech
translation systems return the same translations if the utterances are
linguistically identical, losing paralinguistic information. Concentrating on
focus, a type of emphasis, we propose mapping paralinguistic information into
the linguistic domain within the source language using lexical and grammatical
devices. This method enables us to translate the paraphrased text
representations instead of the transcription of the original speech and obtain
translations that preserve paralinguistic information. As a first step, we
present the collection of an English corpus containing speech that differed in
the placement of focus along with the corresponding text, which was designed to
reflect the implied meaning of the speech. Also, analyses of our corpus
demonstrated that mapping of focus from the paralinguistic domain into the
linguistic domain involved various lexical and grammatical methods. The data
and insights from our analysis will further advance research into
paralinguistic translation. The corpus will be published via LDC.
Related papers
- Assessing the Role of Lexical Semantics in Cross-lingual Transfer through Controlled Manipulations [15.194196775504613]
We analyze how differences between English and a target language influence the capacity to align the language with an English pretrained representation space.
We show that while properties such as the script or word order only have a limited impact on alignment quality, the degree of lexical matching between the two languages, which we define using a measure of translation entropy, greatly affects it.
arXiv Detail & Related papers (2024-08-14T14:59:20Z) - Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - Enhancing expressivity transfer in textless speech-to-speech translation [0.0]
Existing state-of-the-art systems fall short when it comes to capturing and transferring expressivity accurately across different languages.
This study presents a novel method that operates at the discrete speech unit level and leverages multilingual emotion embeddings.
We demonstrate how these embeddings can be used to effectively predict the pitch and duration of speech units in the target language.
arXiv Detail & Related papers (2023-10-11T08:07:22Z) - Improving Mandarin Prosodic Structure Prediction with Multi-level
Contextual Information [68.89000132126536]
This work proposes to use inter-utterance linguistic information to improve the performance of prosodic structure prediction (PSP)
Our method achieves better F1 scores in predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase (IPH)
arXiv Detail & Related papers (2023-08-31T09:19:15Z) - Learning Multilingual Expressive Speech Representation for Prosody
Prediction without Parallel Data [0.0]
We propose a method for speech-to-speech emotion translation that operates at the level of discrete speech units.
We show that this embedding can be used to predict the pitch and duration of speech units in a target language.
We evaluate our approach to English and French speech signals and show that it outperforms a baseline method.
arXiv Detail & Related papers (2023-06-29T08:06:54Z) - A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics.
Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z) - Direct Speech-to-speech Translation without Textual Annotation using
Bottleneck Features [13.44542301438426]
We propose a direct speech-to-speech translation model which can be trained without any textual annotation or content information.
Experiments on Mandarin-Cantonese speech translation demonstrate the feasibility of the proposed approach.
arXiv Detail & Related papers (2022-12-12T10:03:10Z) - Unified Speech-Text Pre-training for Speech Translation and Recognition [113.31415771943162]
We describe a method to jointly pre-train speech and text in an encoder-decoder modeling framework for speech translation and recognition.
The proposed method incorporates four self-supervised and supervised subtasks for cross modality learning.
It achieves between 1.7 and 2.3 BLEU improvement above the state of the art on the MuST-C speech translation dataset.
arXiv Detail & Related papers (2022-04-11T20:59:51Z) - Bridging the Modality Gap for Speech-to-Text Translation [57.47099674461832]
End-to-end speech translation aims to translate speech in one language into text in another language via an end-to-end way.
Most existing methods employ an encoder-decoder structure with a single encoder to learn acoustic representation and semantic information simultaneously.
We propose a Speech-to-Text Adaptation for Speech Translation model which aims to improve the end-to-end model performance by bridging the modality gap between speech and text.
arXiv Detail & Related papers (2020-10-28T12:33:04Z) - Pragmatic information in translation: a corpus-based study of tense and
mood in English and German [70.3497683558609]
Grammatical tense and mood are important linguistic phenomena to consider in natural language processing (NLP) research.
We consider the correspondence between English and German tense and mood in translation.
Of particular importance is the challenge of modeling tense and mood in rule-based, phrase-based statistical and neural machine translation.
arXiv Detail & Related papers (2020-07-10T08:15:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.