Cross-lingual Opinions and Emotions Mining in Comparable Documents
- URL: http://arxiv.org/abs/2508.03112v1
- Date: Tue, 05 Aug 2025 05:44:28 GMT
- Title: Cross-lingual Opinions and Emotions Mining in Comparable Documents
- Authors: Motaz Saad, David Langlois, Kamel Smaili,
- Abstract summary: This research studies differences in sentiments and emotions across English-Arabic comparable documents.<n>We manually translate the English WordNet-Affect (WNA) lexicon into Arabic, creating bilingual emotion lexicons used to label the comparable corpora.<n>Results show that sentiment and emotion annotations align when articles come from the same news agency and diverge when they come from different ones.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Comparable texts are topic-aligned documents in multiple languages that are not direct translations. They are valuable for understanding how a topic is discussed across languages. This research studies differences in sentiments and emotions across English-Arabic comparable documents. First, texts are annotated with sentiment and emotion labels. We apply a cross-lingual method to label documents with opinion classes (subjective/objective), avoiding reliance on machine translation. To annotate with emotions (anger, disgust, fear, joy, sadness, surprise), we manually translate the English WordNet-Affect (WNA) lexicon into Arabic, creating bilingual emotion lexicons used to label the comparable corpora. We then apply a statistical measure to assess the agreement of sentiments and emotions in each source-target document pair. This comparison is especially relevant when the documents originate from different sources. To our knowledge, this aspect has not been explored in prior literature. Our study includes English-Arabic document pairs from Euronews, BBC, and Al-Jazeera (JSC). Results show that sentiment and emotion annotations align when articles come from the same news agency and diverge when they come from different ones. The proposed method is language-independent and generalizable to other language pairs.
Related papers
- Building and Aligning Comparable Corpora [0.0]
Comparable corpus is a set of topic aligned documents in multiple languages.<n>We present a method to build comparable corpora from Wikipedia encyclopedia and EURONEWS website in English, French and Arabic languages.<n>We also experiment a method to automatically align comparable documents using cross-lingual similarity measures.
arXiv Detail & Related papers (2025-08-04T16:05:36Z) - SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection [76.18321723846616]
Task covers more than 30 languages from seven distinct language families.<n>Data instances are multi-labeled with six emotional classes, with additional datasets in 11 languages annotated for emotion intensity.<n>Participants were asked to predict labels in three tracks: (a) multilabel emotion detection, (b) emotion intensity score detection, and (c) cross-lingual emotion detection.
arXiv Detail & Related papers (2025-03-10T12:49:31Z) - BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages [93.92804151830744]
We present BRIGHTER, a collection of multi-labeled, emotion-annotated datasets in 28 different languages.<n>We highlight the challenges related to the data collection and annotation processes.<n>We show that the BRIGHTER datasets represent a meaningful step towards addressing the gap in text-based emotion recognition.
arXiv Detail & Related papers (2025-02-17T15:39:50Z) - You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools [74.98850427240464]
We show that sentiment analysis tools disagree on the same dataset.
We show that the sentiment tool used for sentiment annotation can even be predicted from its outcome.
arXiv Detail & Related papers (2024-10-18T17:27:38Z) - MELD-ST: An Emotion-aware Speech Translation Dataset [29.650945917540316]
We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs.
Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset.
Baseline experiments using the SeamlessM4T model on the dataset indicate that fine-tuning with emotion labels can enhance translation performance in some settings.
arXiv Detail & Related papers (2024-05-21T22:40:38Z) - What is Sentiment Meant to Mean to Language Models? [0.0]
"sentiment" entails a wide variety of concepts depending on the domain and tools used.
"sentiment" has been used to mean emotion, opinions, market movements, or simply a general good-bad'' dimension.
arXiv Detail & Related papers (2024-05-03T19:37:37Z) - English Prompts are Better for NLI-based Zero-Shot Emotion
Classification than Target-Language Prompts [17.099269597133265]
We show that it is consistently better to use English prompts even if the data is in a different language.
Our experiments with natural language inference-based language models show that it is consistently better to use English prompts even if the data is in a different language.
arXiv Detail & Related papers (2024-02-05T17:36:19Z) - Towards Unsupervised Recognition of Token-level Semantic Differences in
Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task.
We study three unsupervised approaches that rely on a masked language model.
Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - Multilingual Contextual Affective Analysis of LGBT People Portrayals in
Wikipedia [34.183132688084534]
Specific lexical choices in narrative text reflect both the writer's attitudes towards people in the narrative and influence the audience's reactions.
We show how word connotations differ across languages and cultures, highlighting the difficulty of generalizing existing English datasets.
We then demonstrate the usefulness of our method by analyzing Wikipedia biography pages of members of the LGBT community across three languages.
arXiv Detail & Related papers (2020-10-21T08:27:36Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.