Mavericks at BLP-2023 Task 1: Ensemble-based Approach Using Language
Models for Violence Inciting Text Detection
- URL: http://arxiv.org/abs/2311.18778v1
- Date: Thu, 30 Nov 2023 18:23:38 GMT
- Title: Mavericks at BLP-2023 Task 1: Ensemble-based Approach Using Language
Models for Violence Inciting Text Detection
- Authors: Saurabh Page, Sudeep Mangalvedhekar, Kshitij Deshpande, Tanmay Chavan
and Sheetal Sonawane
- Abstract summary: Social media has accelerated the propagation of hate and violence-inciting speech in society.
The problem of detecting violence-inciting texts is further exacerbated in low-resource settings due to sparse research and less data.
This paper presents our work for the Violence Inciting Text Detection shared task in the First Workshop on Bangla Language Processing.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents our work for the Violence Inciting Text Detection shared
task in the First Workshop on Bangla Language Processing. Social media has
accelerated the propagation of hate and violence-inciting speech in society. It
is essential to develop efficient mechanisms to detect and curb the propagation
of such texts. The problem of detecting violence-inciting texts is further
exacerbated in low-resource settings due to sparse research and less data. The
data provided in the shared task consists of texts in the Bangla language,
where each example is classified into one of the three categories defined based
on the types of violence-inciting texts. We try and evaluate several BERT-based
models, and then use an ensemble of the models as our final submission. Our
submission is ranked 10th in the final leaderboard of the shared task with a
macro F1 score of 0.737.
Related papers
- BanTH: A Multi-label Hate Speech Detection Dataset for Transliterated Bangla [0.0]
We introduce BanTH, the first multi-label transliterated Bangla hate speech dataset comprising 37.3k samples.
The samples are sourced from YouTube comments, where each instance is labeled with one or more target groups.
Experiments reveal that our further pre-trained encoders are achieving state-of-the-art performance on the BanTH dataset.
arXiv Detail & Related papers (2024-10-17T07:15:15Z) - nlpBDpatriots at BLP-2023 Task 1: A Two-Step Classification for Violence
Inciting Text Detection in Bangla [7.3481279783709805]
In this paper, we discuss the nlpBDpatriots entry to the shared task on Violence Inciting Text Detection (VITD)
The aim of this task is to identify and classify the violent threats, that provoke further unlawful violent acts.
Our best-performing approach for the task is two-step classification using back translation and multilinguality which ranked 6th out of 27 teams with a macro F1 score of 0.74.
arXiv Detail & Related papers (2023-11-25T13:47:34Z) - Understanding writing style in social media with a supervised
contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation.
We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts.
Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z) - BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models
for Violence Inciting Text Detection in Bengali [0.46040036610482665]
This paper presents the system that we have developed while solving this shared task on violence inciting text detection in Bangla.
We explain both the traditional and the recent approaches that we used to make our models learn.
Our proposed system helps to classify if the given text contains any threat.
arXiv Detail & Related papers (2023-10-16T19:35:04Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - RuArg-2022: Argument Mining Evaluation [69.87149207721035]
This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts.
A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic was prepared.
The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture.
arXiv Detail & Related papers (2022-06-18T17:13:37Z) - A study of text representations in Hate Speech Detection [0.0]
Current EU and US legislation against hateful language has led to automatic tools being a necessary component of the Hate Speech detection task and pipeline.
In this study, we examine the performance of several, diverse text representation techniques paired with multiple classification algorithms, on the automatic Hate Speech detection task.
arXiv Detail & Related papers (2021-02-08T20:39:17Z) - Pre-training via Paraphrasing [96.79972492585112]
We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual paraphrasing objective.
We show it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization.
For example, with no additional task-specific training we achieve BLEU scores of up to 35.8 for document translation.
arXiv Detail & Related papers (2020-06-26T14:43:43Z) - OSACT4 Shared Task on Offensive Language Detection: Intensive
Preprocessing-Based Approach [0.0]
This study aims at investigating the impact of the preprocessing phase on text classification for Arabic text.
The Arabic language used in social media is informal and written using Arabic dialects, which makes the text classification task very complex.
An intensive preprocessing-based approach demonstrates its significant impact on offensive language detection and hate speech detection.
arXiv Detail & Related papers (2020-05-14T23:46:10Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.