BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models
for Violence Inciting Text Detection in Bengali
- URL: http://arxiv.org/abs/2310.10781v1
- Date: Mon, 16 Oct 2023 19:35:04 GMT
- Title: BanglaNLP at BLP-2023 Task 1: Benchmarking different Transformer Models
for Violence Inciting Text Detection in Bengali
- Authors: Saumajit Saha and Albert Nanda
- Abstract summary: This paper presents the system that we have developed while solving this shared task on violence inciting text detection in Bangla.
We explain both the traditional and the recent approaches that we used to make our models learn.
Our proposed system helps to classify if the given text contains any threat.
- Score: 0.46040036610482665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents the system that we have developed while solving this
shared task on violence inciting text detection in Bangla. We explain both the
traditional and the recent approaches that we have used to make our models
learn. Our proposed system helps to classify if the given text contains any
threat. We studied the impact of data augmentation when there is a limited
dataset available. Our quantitative results show that finetuning a
multilingual-e5-base model performed the best in our task compared to other
transformer-based architectures. We obtained a macro F1 of 68.11\% in the test
set and our performance in this shared task is ranked at 23 in the leaderboard.
Related papers
- The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - Mavericks at BLP-2023 Task 1: Ensemble-based Approach Using Language
Models for Violence Inciting Text Detection [0.0]
Social media has accelerated the propagation of hate and violence-inciting speech in society.
The problem of detecting violence-inciting texts is further exacerbated in low-resource settings due to sparse research and less data.
This paper presents our work for the Violence Inciting Text Detection shared task in the First Workshop on Bangla Language Processing.
arXiv Detail & Related papers (2023-11-30T18:23:38Z) - nlpBDpatriots at BLP-2023 Task 1: A Two-Step Classification for Violence
Inciting Text Detection in Bangla [7.3481279783709805]
In this paper, we discuss the nlpBDpatriots entry to the shared task on Violence Inciting Text Detection (VITD)
The aim of this task is to identify and classify the violent threats, that provoke further unlawful violent acts.
Our best-performing approach for the task is two-step classification using back translation and multilinguality which ranked 6th out of 27 teams with a macro F1 score of 0.74.
arXiv Detail & Related papers (2023-11-25T13:47:34Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - BanglaNLP at BLP-2023 Task 2: Benchmarking different Transformer Models
for Sentiment Analysis of Bangla Social Media Posts [0.46040036610482665]
This paper presents our submission to Task 2 (Sentiment Analysis of Bangla Social Media Posts) of the BLP Workshop.
Our quantitative results show that transfer learning really helps in better learning of the models in this low-resource language scenario.
We obtain a micro-F1 of 67.02% on the test set and our performance in this shared task is ranked at 21 in the leaderboard.
arXiv Detail & Related papers (2023-10-13T16:46:38Z) - UPB at IberLEF-2023 AuTexTification: Detection of Machine-Generated Text
using Transformer Ensembles [0.5324802812881543]
This paper describes the solutions submitted by the UPB team to the AuTexTification shared task, featured as part of IberLEF-2023.
Our best-performing model achieved macro F1-scores of 66.63% on the English dataset and 67.10% on the Spanish dataset.
arXiv Detail & Related papers (2023-08-02T20:08:59Z) - Bag of Tricks for Effective Language Model Pretraining and Downstream
Adaptation: A Case Study on GLUE [93.98660272309974]
This report briefly describes our submission Vega v1 on the General Language Understanding Evaluation leaderboard.
GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference.
With our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3.
arXiv Detail & Related papers (2023-02-18T09:26:35Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Bangla Text Classification using Transformers [2.3475904942266697]
Text classification has been one of the earliest problems in NLP.
In this work, we fine-tune multilingual Transformer models for Bangla text classification tasks.
We obtain the state of the art results on six benchmark datasets, improving upon the previous results by 5-29% accuracy across different tasks.
arXiv Detail & Related papers (2020-11-09T14:12:07Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.