Related papers: Detecting Hope Across Languages: Multiclass Classification for Positive Online Discourse

Detecting Hope Across Languages: Multiclass Classification for Positive Online Discourse

URL: http://arxiv.org/abs/2509.25752v1
Date: Tue, 30 Sep 2025 04:16:28 GMT
Title: Detecting Hope Across Languages: Multiclass Classification for Positive Online Discourse
Authors: T. O. Abiola, K. D. Abiodun, O. E. Olumide, O. O. Adebanji, O. Hiram Calvo, Grigori Sidorov,
Abstract summary: We present a machine learning approach to multiclass hope speech detection across multiple languages, including English, Urdu, and Spanish.<n>We leverage transformer-based models, specifically XLM-RoBERTa, to detect and categorize hope speech into three distinct classes: Generalized Hope, Realistic Hope, and Unrealistic Hope.<n>Our proposed methodology is evaluated on the PolyHope dataset for the PolyHope-M 2025 shared task, achieving competitive performance across all languages.
Score: 4.905674855734124
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The detection of hopeful speech in social media has emerged as a critical task for promoting positive discourse and well-being. In this paper, we present a machine learning approach to multiclass hope speech detection across multiple languages, including English, Urdu, and Spanish. We leverage transformer-based models, specifically XLM-RoBERTa, to detect and categorize hope speech into three distinct classes: Generalized Hope, Realistic Hope, and Unrealistic Hope. Our proposed methodology is evaluated on the PolyHope dataset for the PolyHope-M 2025 shared task, achieving competitive performance across all languages. We compare our results with existing models, demonstrating that our approach significantly outperforms prior state-of-the-art techniques in terms of macro F1 scores. We also discuss the challenges in detecting hope speech in low-resource languages and the potential for improving generalization. This work contributes to the development of multilingual, fine-grained hope speech detection models, which can be applied to enhance positive content moderation and foster supportive online communities.

Related papers

GHaLIB: A Multilingual Framework for Hope Speech Detection in Low-Resource Languages [0.4915744683251149]
This paper presents a multilingual framework for hope speech detection with a focus on Urdu.<n>Using pretrained transformer models such as XLM-RoBERTa, mBERT, EuroBERT, and UrduBERT, we apply simple preprocessing and train classifiers for improved results.<n> Evaluations on the PolyHope-M 2025 benchmark demonstrate strong performance, achieving F1-scores of 95.2% for Urdu binary classification and 65.2% for Urdu multi-class classification, with similarly competitive results in Spanish, German, and English.
arXiv Detail & Related papers (2025-12-27T21:23:17Z)
Towards Low-Resource Alignment to Diverse Perspectives with Sparse Feedback [13.065059683491958]
We aim to enhance pluralistic alignment of language models in a low-resource setting with two methods: pluralistic decoding and model steering.<n>Our proposed methods decrease false positives in several high-stakes tasks such as hate speech detection and misinformation detection.<n>We hope our work highlights the importance of diversity and how language models can be adapted to consider nuanced perspectives.
arXiv Detail & Related papers (2025-10-17T23:06:21Z)
AIxcellent Vibes at GermEval 2025 Shared Task on Candy Speech Detection: Improving Model Performance by Span-Level Training [0.0]
We investigate how candy speech can be reliably detected in a 46k-comment German YouTube corpus.<n>We find that a multilingual XLM-RoBERTa-Large model trained to detect candy speech at the span level outperforms other approaches.<n>We speculate that span-based training, multilingual capabilities, and emoji-aware tokenizers improved detection performance.
arXiv Detail & Related papers (2025-09-09T07:29:14Z)
CODEOFCONDUCT at Multilingual Counterspeech Generation: A Context-Aware Model for Robust Counterspeech Generation in Low-Resource Languages [1.9263811967110864]
This paper introduces a context-aware model for robust counterspeech generation, which achieved significant success in the MCG-COLING-2025 shared task.<n>By leveraging a simulated annealing algorithm fine-tuned on multilingual datasets, the model generates factually accurate responses to hate speech.<n>We demonstrate state-of-the-art performance across four languages, with our system ranking first for Basque, second for Italian, and third for both English and Spanish.
arXiv Detail & Related papers (2025-01-01T03:36:31Z)
PolyHope: Two-Level Hope Speech Detection from Tweets [68.8204255655161]
Despite its importance, hope has rarely been studied as a social media analysis task. This paper presents a hope speech dataset that classifies each tweet first into "Hope" and "Not Hope" English tweets in the first half of 2022 were collected to build this dataset.
arXiv Detail & Related papers (2022-10-25T16:34:03Z)
Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods. This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z)
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw. At the heart of the approach is a single multilingual token-free Charformer model. We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z)
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z)
Exploring Teacher-Student Learning Approach for Multi-lingual Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages. We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.