Challenges in Translation of Emotions in Multilingual User-Generated
Content: Twitter as a Case Study
- URL: http://arxiv.org/abs/2106.10719v1
- Date: Sun, 20 Jun 2021 16:12:48 GMT
- Title: Challenges in Translation of Emotions in Multilingual User-Generated
Content: Twitter as a Case Study
- Authors: Hadeel Saadany, Constantin Orasan, Rocio Caro Quintana, Felix do
Carmo, Leonardo Zilio
- Abstract summary: We show that there are linguistic phenomena specific of Twitter data that pose a challenge in translation of emotions in different languages.
We also assess the capacity of commonly used methods for evaluating the performance of an MT system with respect to the preservation of emotion in the source text.
- Score: 1.3999481573773072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although emotions are universal concepts, transferring the different shades
of emotion from one language to another may not always be straightforward for
human translators, let alone for machine translation systems. Moreover, the
cognitive states are established by verbal explanations of experience which is
shaped by both the verbal and cultural contexts. There are a number of verbal
contexts where expression of emotions constitutes the pivotal component of the
message. This is particularly true for User-Generated Content (UGC) which can
be in the form of a review of a product or a service, a tweet, or a social
media post. Recently, it has become common practice for multilingual websites
such as Twitter to provide an automatic translation of UGC to reach out to
their linguistically diverse users. In such scenarios, the process of
translating the user's emotion is entirely automatic with no human
intervention, neither for post-editing nor for accuracy checking. In this
research, we assess whether automatic translation tools can be a successful
real-life utility in transferring emotion in user-generated multilingual data
such as tweets. We show that there are linguistic phenomena specific of Twitter
data that pose a challenge in translation of emotions in different languages.
We summarise these challenges in a list of linguistic features and show how
frequent these features are in different language pairs. We also assess the
capacity of commonly used methods for evaluating the performance of an MT
system with respect to the preservation of emotion in the source text.
Related papers
- Recognizing Emotion Regulation Strategies from Human Behavior with Large Language Models [44.015651538470856]
Human emotions are often not expressed directly, but regulated according to internal processes and social display rules.
No method to automatically classify different emotion regulation strategies in a cross-user scenario exists.
We make use of the recently introduced textscDeep corpus for modeling the social display of the emotion shame.
A fine-tuned Llama2-7B model is able to classify the utilized emotion regulation strategy with high accuracy.
arXiv Detail & Related papers (2024-08-08T12:47:10Z) - MASIVE: Open-Ended Affective State Identification in English and Spanish [10.41502827362741]
In this work, we broaden our scope to a practically unbounded set of textitaffective states, which includes any terms that humans use to describe their experiences of feeling.
We collect and publish MASIVE, a dataset of Reddit posts in English and Spanish containing over 1,000 unique affective states each.
On this task, we find that smaller finetuned multilingual models outperform much larger LLMs, even on region-specific Spanish affective states.
arXiv Detail & Related papers (2024-07-16T21:43:47Z) - Usefulness of Emotional Prosody in Neural Machine Translation [1.0205541448656992]
We propose to improve translation quality by adding another external source of information: the automatically recognized emotion in the voice.
This work is motivated by the assumption that each emotion is associated with a specific lexicon that can overlap between emotions.
We show that integrating emotion information, especially arousal, into NMT systems leads to better translations.
arXiv Detail & Related papers (2024-04-27T18:04:28Z) - Sociolinguistically Informed Interpretability: A Case Study on Hinglish
Emotion Classification [8.010713141364752]
We study the effect of language on emotion prediction across 3 PLMs on a Hinglish emotion classification dataset.
We find that models do learn these associations between language choice and emotional expression.
Having code-mixed data present in the pre-training can augment that learning when task-specific data is scarce.
arXiv Detail & Related papers (2024-02-05T16:05:32Z) - BERTuit: Understanding Spanish language in Twitter through a native
transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets.
Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z) - Textless Speech Emotion Conversion using Decomposed and Discrete
Representations [49.55101900501656]
We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.
First, we modify the speech content by translating the content units to a target emotion, and then predict the prosodic features based on these units.
Finally, the speech waveform is generated by feeding the predicted representations into a neural vocoder.
arXiv Detail & Related papers (2021-11-14T18:16:42Z) - Seen and Unseen emotional style transfer for voice conversion with a new
emotional speech dataset [84.53659233967225]
Emotional voice conversion aims to transform emotional prosody in speech while preserving the linguistic content and speaker identity.
We propose a novel framework based on variational auto-encoding Wasserstein generative adversarial network (VAW-GAN)
We show that the proposed framework achieves remarkable performance by consistently outperforming the baseline framework.
arXiv Detail & Related papers (2020-10-28T07:16:18Z) - Curious Case of Language Generation Evaluation Metrics: A Cautionary
Tale [52.663117551150954]
A few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation.
This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them.
In this paper, we urge the community for more careful consideration of how they automatically evaluate their models.
arXiv Detail & Related papers (2020-10-26T13:57:20Z) - Improving Sentiment Analysis over non-English Tweets using Multilingual
Transformers and Automatic Translation for Data-Augmentation [77.69102711230248]
We propose the use of a multilingual transformer model, that we pre-train over English tweets and apply data-augmentation using automatic translation to adapt the model to non-English languages.
Our experiments in French, Spanish, German and Italian suggest that the proposed technique is an efficient way to improve the results of the transformers over small corpora of tweets in a non-English language.
arXiv Detail & Related papers (2020-10-07T15:44:55Z) - Annotation of Emotion Carriers in Personal Narratives [69.07034604580214]
We are interested in the problem of understanding personal narratives (PN) - spoken or written - recollections of facts, events, and thoughts.
In PN, emotion carriers are the speech or text segments that best explain the emotional state of the user.
This work proposes and evaluates an annotation model for identifying emotion carriers in spoken personal narratives.
arXiv Detail & Related papers (2020-02-27T15:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.