Cross-lingual Emotion Intensity Prediction
- URL: http://arxiv.org/abs/2004.04103v2
- Date: Tue, 24 Nov 2020 18:48:50 GMT
- Title: Cross-lingual Emotion Intensity Prediction
- Authors: Irean Navas Alejo, Toni Badia, and Jeremy Barnes
- Abstract summary: Cross-lingual transfer approaches for fine-grained emotion detection in Spanish and Catalan tweets.
We compare six cross-lingual approaches, e.g., machine translation and cross-lingual embeddings, which have varying requirements for parallel data.
The results show that methods with low parallel-data requirements perform surprisingly better than methods that use more parallel data.
- Score: 13.305282275999778
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Emotion intensity prediction determines the degree or intensity of an emotion
that the author expresses in a text, extending previous categorical approaches
to emotion detection. While most previous work on this topic has concentrated
on English texts, other languages would also benefit from fine-grained emotion
classification, preferably without having to recreate the amount of annotated
data available in English in each new language. Consequently, we explore
cross-lingual transfer approaches for fine-grained emotion detection in Spanish
and Catalan tweets. To this end we annotate a test set of Spanish and Catalan
tweets using Best-Worst scaling. We compare six cross-lingual approaches, e.g.,
machine translation and cross-lingual embeddings, which have varying
requirements for parallel data -- from millions of parallel sentences to
completely unsupervised. The results show that on this data, methods with low
parallel-data requirements perform surprisingly better than methods that use
more parallel data, which we explain through an in-depth error analysis. We
make the dataset and the code available at
\url{https://github.com/jerbarnes/fine-grained_cross-lingual_emotion}
Related papers
- Improving Multi-lingual Alignment Through Soft Contrastive Learning [9.454626745893798]
We propose a novel method to align multi-lingual embeddings based on the similarity of sentences measured by a pre-trained mono-lingual embedding model.
Given translation sentence pairs, we train a multi-lingual model in a way that the similarity between cross-lingual embeddings follows the similarity of sentences measured at the mono-lingual teacher model.
arXiv Detail & Related papers (2024-05-25T09:46:07Z) - Question Translation Training for Better Multilingual Reasoning [108.10066378240879]
Large language models show compelling performance on reasoning tasks but they tend to perform much worse in languages other than English.
A typical solution is to translate instruction data into all languages of interest, and then train on the resulting multilingual data, which is called translate-training.
In this paper we explore the benefits of question alignment, where we train the model to translate reasoning questions into English by finetuning on X-English parallel question data.
arXiv Detail & Related papers (2024-01-15T16:39:10Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Training Effective Neural Sentence Encoders from Automatically Mined
Paraphrases [0.0]
We propose a method for training effective language-specific sentence encoders without manually labeled data.
Our approach is to automatically construct a dataset of paraphrase pairs from sentence-aligned bilingual text corpora.
Our sentence encoder can be trained in less than a day on a single graphics card, achieving high performance on a diverse set of sentence-level tasks.
arXiv Detail & Related papers (2022-07-26T09:08:56Z) - Cross-lingual Intermediate Fine-tuning improves Dialogue State Tracking [84.50302759362698]
We enhance the transfer learning process by intermediate fine-tuning of pretrained multilingual models.
We use parallel and conversational movie subtitles datasets to design cross-lingual intermediate tasks.
We achieve impressive improvements (> 20% on goal accuracy) on the parallel MultiWoZ dataset and Multilingual WoZ dataset.
arXiv Detail & Related papers (2021-09-28T11:22:38Z) - Cross-lingual Emotion Detection [6.767035411834297]
We consider English as the source language with Arabic and Spanish as target languages.
Our BERT-based monolingual models that are trained on target language data surpass state-of-the-art (SOTA) by 4% and 5% absolute Jaccard score for Arabic and Spanish respectively.
Next, we show that using cross-lingual approaches with English data alone, we can achieve more than 90% and 80% relative effectiveness of the Arabic and Spanish BERT models respectively.
arXiv Detail & Related papers (2021-06-10T19:52:06Z) - Semi-automatic Generation of Multilingual Datasets for Stance Detection
in Twitter [9.359018642178917]
This paper presents a method to obtain multilingual datasets for stance detection in Twitter.
We leverage user-based information to semi-automatically label large amounts of tweets.
arXiv Detail & Related papers (2021-01-28T13:05:09Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Improving Sentiment Analysis over non-English Tweets using Multilingual
Transformers and Automatic Translation for Data-Augmentation [77.69102711230248]
We propose the use of a multilingual transformer model, that we pre-train over English tweets and apply data-augmentation using automatic translation to adapt the model to non-English languages.
Our experiments in French, Spanish, German and Italian suggest that the proposed technique is an efficient way to improve the results of the transformers over small corpora of tweets in a non-English language.
arXiv Detail & Related papers (2020-10-07T15:44:55Z) - NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection
with Cross-lingual Transfer [10.007363787391952]
This paper describes our approach to the task of identifying offensive languages in a multilingual setting.
We investigate two data augmentation strategies: using additional semi-supervised labels with different thresholds and cross-lingual transfer with data selection.
Our multilingual systems achieved competitive results in Greek, Danish, and Turkish at OffensEval 2020.
arXiv Detail & Related papers (2020-08-04T06:20:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.