Hope Speech detection in under-resourced Kannada language
- URL: http://arxiv.org/abs/2108.04616v1
- Date: Tue, 10 Aug 2021 11:59:42 GMT
- Title: Hope Speech detection in under-resourced Kannada language
- Authors: Adeep Hande, Ruba Priyadharshini, Anbukkarasi Sampath, Kingston Pal
Thamburaj, Prabakaran Chandran, Bharathi Raja Chakravarthi
- Abstract summary: We propose creating an English-Kannada Hope speech dataset, KanHope.
The dataset consists of 6,176 user-generated comments in code mixed Kannada scraped from YouTube.
We introduce DC-BERT4HOPE, a dual-channel model that uses the English translation of KanHope for additional training to promote hope speech detection.
- Score: 0.1759008116536278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Numerous methods have been developed to monitor the spread of negativity in
modern years by eliminating vulgar, offensive, and fierce comments from social
media platforms. However, there are relatively lesser amounts of study that
converges on embracing positivity, reinforcing supportive and reassuring
content in online forums. Consequently, we propose creating an English-Kannada
Hope speech dataset, KanHope and comparing several experiments to benchmark the
dataset. The dataset consists of 6,176 user-generated comments in code mixed
Kannada scraped from YouTube and manually annotated as bearing hope speech or
Not-hope speech. In addition, we introduce DC-BERT4HOPE, a dual-channel model
that uses the English translation of KanHope for additional training to promote
hope speech detection. The approach achieves a weighted F1-score of 0.756,
bettering other models. Henceforth, KanHope aims to instigate research in
Kannada while broadly promoting researchers to take a pragmatic approach
towards online content that encourages, positive, and supportive.
Related papers
- Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects [72.18753241750964]
Yorub'a is an African language with roughly 47 million speakers.
Recent efforts to develop NLP technologies for African languages have focused on their standard dialects.
We take steps towards bridging this gap by introducing a new high-quality parallel text and speech corpus.
arXiv Detail & Related papers (2024-06-27T22:38:04Z) - TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation [97.54885207518946]
We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion.
We propose two separated encoders to preserve the speaker's voice characteristics and isochrony from the source speech during the translation process.
Our experiments on the French-English language pair demonstrate that our model outperforms the current state-of-the-art speech-to-speech translation model.
arXiv Detail & Related papers (2024-05-28T04:11:37Z) - Hate Speech and Offensive Content Detection in Indo-Aryan Languages: A
Battle of LSTM and Transformers [0.0]
We conduct a comparative analysis of hate speech classification across five distinct languages: Bengali, Assamese, Bodo, Sinhala, and Gujarati.
Bert Base Multilingual Cased emerges as a strong performer across languages, achieving an F1 score of 0.67027 for Bengali and 0.70525 for Assamese.
In Sinhala, XLM-R stands out with an F1 score of 0.83493, whereas for Gujarati, a custom LSTM-based model outshined with an F1 score of 0.76601.
arXiv Detail & Related papers (2023-12-09T20:24:00Z) - Beyond Negativity: Re-Analysis and Follow-Up Experiments on Hope Speech
Detection [0.0]
Hope speech refers to comments, posts and other social media messages that offer support, reassurance, suggestions, inspiration, and insight.
Our study aims to find efficient yet comparable/superior methods for hope speech detection.
arXiv Detail & Related papers (2023-05-10T18:38:48Z) - Hope Speech Detection on Social Media Platforms [1.2561455657923906]
This paper discusses various machine learning approaches to identify a sentence as Hope Speech, Non-Hope Speech, or a Neutral sentence.
The dataset used in the study contains English YouTube comments.
arXiv Detail & Related papers (2022-11-14T10:58:22Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - PolyHope: Two-Level Hope Speech Detection from Tweets [68.8204255655161]
Despite its importance, hope has rarely been studied as a social media analysis task.
This paper presents a hope speech dataset that classifies each tweet first into "Hope" and "Not Hope"
English tweets in the first half of 2022 were collected to build this dataset.
arXiv Detail & Related papers (2022-10-25T16:34:03Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z) - Classification Benchmarks for Under-resourced Bengali Language based on
Multichannel Convolutional-LSTM Network [3.0168410626760034]
We build the largest Bengali word embedding models to date based on 250 million articles, which we call BengFastText.
We incorporate word embeddings into a Multichannel Convolutional-LSTM network for predicting different types of hate speech, document classification, and sentiment analysis.
arXiv Detail & Related papers (2020-04-11T22:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.