Detecting Hope Across Languages: Multiclass Classification for Positive Online Discourse
- URL: http://arxiv.org/abs/2509.25752v1
- Date: Tue, 30 Sep 2025 04:16:28 GMT
- Title: Detecting Hope Across Languages: Multiclass Classification for Positive Online Discourse
- Authors: T. O. Abiola, K. D. Abiodun, O. E. Olumide, O. O. Adebanji, O. Hiram Calvo, Grigori Sidorov,
- Abstract summary: We present a machine learning approach to multiclass hope speech detection across multiple languages, including English, Urdu, and Spanish.<n>We leverage transformer-based models, specifically XLM-RoBERTa, to detect and categorize hope speech into three distinct classes: Generalized Hope, Realistic Hope, and Unrealistic Hope.<n>Our proposed methodology is evaluated on the PolyHope dataset for the PolyHope-M 2025 shared task, achieving competitive performance across all languages.
- Score: 4.905674855734124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The detection of hopeful speech in social media has emerged as a critical task for promoting positive discourse and well-being. In this paper, we present a machine learning approach to multiclass hope speech detection across multiple languages, including English, Urdu, and Spanish. We leverage transformer-based models, specifically XLM-RoBERTa, to detect and categorize hope speech into three distinct classes: Generalized Hope, Realistic Hope, and Unrealistic Hope. Our proposed methodology is evaluated on the PolyHope dataset for the PolyHope-M 2025 shared task, achieving competitive performance across all languages. We compare our results with existing models, demonstrating that our approach significantly outperforms prior state-of-the-art techniques in terms of macro F1 scores. We also discuss the challenges in detecting hope speech in low-resource languages and the potential for improving generalization. This work contributes to the development of multilingual, fine-grained hope speech detection models, which can be applied to enhance positive content moderation and foster supportive online communities.
Related papers
- GHaLIB: A Multilingual Framework for Hope Speech Detection in Low-Resource Languages [0.4915744683251149]
This paper presents a multilingual framework for hope speech detection with a focus on Urdu.<n>Using pretrained transformer models such as XLM-RoBERTa, mBERT, EuroBERT, and UrduBERT, we apply simple preprocessing and train classifiers for improved results.<n> Evaluations on the PolyHope-M 2025 benchmark demonstrate strong performance, achieving F1-scores of 95.2% for Urdu binary classification and 65.2% for Urdu multi-class classification, with similarly competitive results in Spanish, German, and English.
arXiv Detail & Related papers (2025-12-27T21:23:17Z) - Towards Low-Resource Alignment to Diverse Perspectives with Sparse Feedback [13.065059683491958]
We aim to enhance pluralistic alignment of language models in a low-resource setting with two methods: pluralistic decoding and model steering.<n>Our proposed methods decrease false positives in several high-stakes tasks such as hate speech detection and misinformation detection.<n>We hope our work highlights the importance of diversity and how language models can be adapted to consider nuanced perspectives.
arXiv Detail & Related papers (2025-10-17T23:06:21Z) - AIxcellent Vibes at GermEval 2025 Shared Task on Candy Speech Detection: Improving Model Performance by Span-Level Training [0.0]
We investigate how candy speech can be reliably detected in a 46k-comment German YouTube corpus.<n>We find that a multilingual XLM-RoBERTa-Large model trained to detect candy speech at the span level outperforms other approaches.<n>We speculate that span-based training, multilingual capabilities, and emoji-aware tokenizers improved detection performance.
arXiv Detail & Related papers (2025-09-09T07:29:14Z) - CODEOFCONDUCT at Multilingual Counterspeech Generation: A Context-Aware Model for Robust Counterspeech Generation in Low-Resource Languages [1.9263811967110864]
This paper introduces a context-aware model for robust counterspeech generation, which achieved significant success in the MCG-COLING-2025 shared task.<n>By leveraging a simulated annealing algorithm fine-tuned on multilingual datasets, the model generates factually accurate responses to hate speech.<n>We demonstrate state-of-the-art performance across four languages, with our system ranking first for Basque, second for Italian, and third for both English and Spanish.
arXiv Detail & Related papers (2025-01-01T03:36:31Z) - PolyHope: Two-Level Hope Speech Detection from Tweets [68.8204255655161]
Despite its importance, hope has rarely been studied as a social media analysis task.
This paper presents a hope speech dataset that classifies each tweet first into "Hope" and "Not Hope"
English tweets in the first half of 2022 were collected to build this dataset.
arXiv Detail & Related papers (2022-10-25T16:34:03Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.