Related papers: IITR-CIOL@NLU of Devanagari Script Languages 2025: Multilingual Hate Speech Detection and Target Identification in Devanagari-Scripted Languages

IITR-CIOL@NLU of Devanagari Script Languages 2025: Multilingual Hate Speech Detection and Target Identification in Devanagari-Scripted Languages

URL: http://arxiv.org/abs/2412.17947v2
Date: Sat, 28 Dec 2024 21:27:17 GMT
Title: IITR-CIOL@NLU of Devanagari Script Languages 2025: Multilingual Hate Speech Detection and Target Identification in Devanagari-Scripted Languages
Authors: Siddhant Gupta, Siddh Singhal, Azmine Toushik Wasi,
Abstract summary: This work focuses on two subtasks related to hate speech detection and target identification in Devanagari-scripted languages.<n>Subtask B involves detecting hate speech in online text, while Subtask C requires identifying the specific targets of hate speech.<n>We propose the MultilingualRobertaClass model, a deep neural network built on the pretrained multilingual transformer model ia-multilingual-transliterated-roberta.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This work focuses on two subtasks related to hate speech detection and target identification in Devanagari-scripted languages, specifically Hindi, Marathi, Nepali, Bhojpuri, and Sanskrit. Subtask B involves detecting hate speech in online text, while Subtask C requires identifying the specific targets of hate speech, such as individuals, organizations, or communities. We propose the MultilingualRobertaClass model, a deep neural network built on the pretrained multilingual transformer model ia-multilingual-transliterated-roberta, optimized for classification tasks in multilingual and transliterated contexts. The model leverages contextualized embeddings to handle linguistic diversity, with a classifier head for binary classification. We received 88.40% accuracy in Subtask B and 66.11% accuracy in Subtask C, in the test set.

Related papers

Hate Speech Detection and Target Identification in Devanagari Languages via Parameter Efficient Fine-Tuning of LLMs [9.234570108225187]
We propose an Efficient Fine tuning (PEFT) based solution for hate speech detection and target identification.<n>We evaluate multiple LLMs on the Devanagari dataset provided by (Thapa et al., 2025)<n>Results demonstrate the efficacy of our approach in handling Devanagari-scripted content.
arXiv Detail & Related papers (2024-12-22T18:38:24Z)
NLPineers@ NLU of Devanagari Script Languages 2025: Hate Speech Detection using Ensembling of BERT-based models [0.9974630621313314]
This paper focuses on hate speech detection in Devanagari-scripted languages, focusing on Hindi and Nepali. Using a range of transformer-based models, we examine their effectiveness in navigating the nuanced boundary between hate speech and free expression. This work emphasizes the need for hate speech detection in Devanagari-scripted languages and presents a foundation for further research.
arXiv Detail & Related papers (2024-12-11T07:37:26Z)
1-800-SHARED-TASKS @ NLU of Devanagari Script Languages: Detection of Language, Hate Speech, and Targets using LLMs [0.0]
This paper presents a detailed system description of our entry for the CHiPSAL 2025 shared task. We focus on language detection, hate speech identification, and target detection in Devanagari script languages.
arXiv Detail & Related papers (2024-11-11T10:34:36Z)
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment [50.80949663719335]
multilingual sentence encoders (MSEs) are commonly obtained by training multilingual language models to map sentences from different languages into a shared semantic space.<n>MSEs are subject to curse of multilinguality, a loss of monolingual representational accuracy due to parameter sharing.<n>We train the cross-lingual adapters with two different types of data to resolve the conflicting requirements of different cross-lingual tasks.
arXiv Detail & Related papers (2024-07-20T13:56:39Z)
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models. Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z)
Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks. We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset. To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z)
Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models [3.97478982737167]
We show how hate speech detection models benefit from a cross-lingual knowledge proxy brought by auxiliary tasks fine-tuning. We propose to train on multilingual auxiliary tasks to improve zero-shot transfer of hate speech detection models across languages.
arXiv Detail & Related papers (2022-10-24T08:26:51Z)
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages. Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z)
Highly Generalizable Models for Multilingual Hate Speech Detection [0.0]
Hate speech detection has become an important research topic within the past decade. We compile a dataset of 11 languages and resolve different by analyzing the combined data with binary labels: hate speech or not hate speech. We conduct three types of experiments for a binary hate speech classification task: Multilingual-Train Monolingual-Test, MonolingualTrain Monolingual-Test and Language-Family-Train Monolingual Test scenarios.
arXiv Detail & Related papers (2022-01-27T03:09:38Z)
Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language. We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
Exploring Teacher-Student Learning Approach for Multi-lingual Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages. We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z)
Cross-lingual Capsule Network for Hate Speech Detection in Social Media [6.531659195805749]
We investigate the cross-lingual hate speech detection task, tackling the problem by adapting the hate speech resources from one language to another. We propose a cross-lingual capsule network learning model coupled with extra domain-specific lexical semantics for hate speech. Our model achieves state-of-the-art performance on benchmark datasets from AMI@Evalita 2018 and AMI@Ibereval 2018.
arXiv Detail & Related papers (2021-08-06T12:53:41Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages. We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC) LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language. We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.