Emotion Recognition for Low-Resource Turkish: Fine-Tuning BERTurk on TREMO and Testing on Xenophobic Political Discourse
- URL: http://arxiv.org/abs/2505.12160v1
- Date: Sat, 17 May 2025 22:38:18 GMT
- Title: Emotion Recognition for Low-Resource Turkish: Fine-Tuning BERTurk on TREMO and Testing on Xenophobic Political Discourse
- Authors: Darmawan Wicaksono, Hasri Akbar Awal Rozaq, Nevfel Boz,
- Abstract summary: This study examines the term Sessiz Istila (Silent Invasion) on Turkish social media, highlighting the rise of anti-refugee sentiment amidst the Syrian refugee influx.<n>Using BERTurk and the TREMO dataset, we developed an advanced Emotion Recognition Model (ERM) tailored for Turkish.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Social media platforms like X (formerly Twitter) play a crucial role in shaping public discourse and societal norms. This study examines the term Sessiz Istila (Silent Invasion) on Turkish social media, highlighting the rise of anti-refugee sentiment amidst the Syrian refugee influx. Using BERTurk and the TREMO dataset, we developed an advanced Emotion Recognition Model (ERM) tailored for Turkish, achieving 92.62% accuracy in categorizing emotions such as happiness, fear, anger, sadness, disgust, and surprise. By applying this model to large-scale X data, the study uncovers emotional nuances in Turkish discourse, contributing to computational social science by advancing sentiment analysis in underrepresented languages and enhancing our understanding of global digital discourse and the unique linguistic challenges of Turkish. The findings underscore the transformative potential of localized NLP tools, with our ERM model offering practical applications for real-time sentiment analysis in Turkish-language contexts. By addressing critical areas, including marketing, public relations, and crisis management, these models facilitate improved decision-making through timely and accurate sentiment tracking. This highlights the significance of advancing research that accounts for regional and linguistic nuances.
Related papers
- MASim: Multilingual Agent-Based Simulation for Social Science [68.04129327237963]
Multi-agent role-playing has recently shown promise for studying social behavior with language agents.<n>Existing simulations are mostly monolingual and fail to model cross-lingual interaction.<n>We introduce MASim, the first multilingual agent-based simulation framework.
arXiv Detail & Related papers (2025-12-08T06:12:48Z) - Developing a Comprehensive Framework for Sentiment Analysis in Turkish [0.0]
This thesis can be considered the most detailed and comprehensive study made on sentiment analysis in Turkish as of July, 2020.<n>We developed a comprehensive framework for sentiment analysis that takes its many aspects into account mainly for Turkish.<n>We built novel word embeddings that exploit sentiment, syntactic, semantic, and lexical characteristics for both Turkish and English.
arXiv Detail & Related papers (2025-11-29T15:14:57Z) - Ensembling Multilingual Transformers for Robust Sentiment Analysis of Tweets [0.0]
We present a transformer ensemble model and a large language model (LLM) that employs sentiment analysis of other languages.<n> Sentiment was then assessed for sentences using an ensemble of pre-trained sentiment analysis models: bert-base-multilingual-uncased-sentiment, and XLM-R.<n>Our experimental results indicated that sentiment analysis performance was more than 86% using the proposed method.
arXiv Detail & Related papers (2025-09-28T21:34:48Z) - BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages [93.92804151830744]
We present BRIGHTER, a collection of multi-labeled, emotion-annotated datasets in 28 different languages.<n>We highlight the challenges related to the data collection and annotation processes.<n>We show that the BRIGHTER datasets represent a meaningful step towards addressing the gap in text-based emotion recognition.
arXiv Detail & Related papers (2025-02-17T15:39:50Z) - Enhancing Aspect-based Sentiment Analysis with ParsBERT in Persian Language [0.0]
This paper aims to amplify the efficiency of language models tailored to the Persian language.<n>The study centers on sentiment analysis of user opinions extracted from the Persian website 'Digikala'
arXiv Detail & Related papers (2025-02-03T06:25:06Z) - TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish [54.51310112013655]
We introduce the first multitask, multiple-choice Turkish QA benchmark, TurkishMMLU.
TurkishMMLU includes over 10,000 questions, covering 9 different subjects from Turkish high-school education curricula.
We evaluate over 20 LLMs, including multilingual open-source (e.g., Gemma, Llama, MT5), closed-source (GPT 4o, Claude, Gemini), and Turkish-adapted (e.g., Trendyol) models.
arXiv Detail & Related papers (2024-07-17T08:28:55Z) - CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models [59.22460740026037]
"CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset is designed to evaluate the social and cultural variation of Large Language Models (LLMs)
We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, social welfare, immigration, disability rights, and surrogacy.
arXiv Detail & Related papers (2024-05-22T20:19:10Z) - Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking [1.3716808114696444]
Large Language Models (LLMs) are becoming crucial across various fields, emphasizing the urgency for high-quality models in underrepresented languages.
This study explores the unique challenges faced by low-resource languages, such as data scarcity, model selection, evaluation, and computational limitations.
arXiv Detail & Related papers (2024-05-07T21:58:45Z) - The Call for Socially Aware Language Technologies [94.6762219597438]
We argue that many of these issues share a common core: a lack of awareness of the factors, context, and implications of the social environment in which NLP operates.<n>We argue that substantial challenges remain for NLP to develop social awareness and that we are just at the beginning of a new era for the field.
arXiv Detail & Related papers (2024-05-03T18:12:39Z) - Cross-Lingual Learning vs. Low-Resource Fine-Tuning: A Case Study with Fact-Checking in Turkish [0.9217021281095907]
We introduce the FCTR dataset, consisting of 3238 real-world claims.
This dataset spans multiple domains and incorporates evidence collected from three Turkish fact-checking organizations.
arXiv Detail & Related papers (2024-03-01T09:57:46Z) - TurkishBERTweet: Fast and Reliable Large Language Model for Social Media
Analysis [4.195270491854775]
We introduce TurkishBERTweet, the first large scale pre-trained language model for Turkish social media built using almost 900 million tweets.
The model shares the same architecture as base BERT model with smaller input length, making TurkishBERTweet lighter than BERTurk.
We demonstrate that TurkishBERTweet outperforms the other available alternatives on generalizability and its lower inference time gives significant advantage to process large-scale datasets.
arXiv Detail & Related papers (2023-11-29T20:22:44Z) - HuBERT-TR: Reviving Turkish Automatic Speech Recognition with
Self-supervised Speech Representation Learning [10.378738776547815]
We present HuBERT-TR, a speech representation model for Turkish based on HuBERT.
HuBERT-TR achieves state-of-the-art results on several Turkish ASR datasets.
arXiv Detail & Related papers (2022-10-13T19:46:39Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Reinforcement Learning for Emotional Text-to-Speech Synthesis with
Improved Emotion Discriminability [82.39099867188547]
Emotional text-to-speech synthesis (ETTS) has seen much progress in recent years.
We propose a new interactive training paradigm for ETTS, denoted as i-ETTS.
We formulate an iterative training strategy with reinforcement learning to ensure the quality of i-ETTS optimization.
arXiv Detail & Related papers (2021-04-03T13:52:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.