How Effectively Can BERT Models Interpret Context and Detect Bengali Communal Violent Text?
- URL: http://arxiv.org/abs/2506.19831v1
- Date: Tue, 24 Jun 2025 17:48:49 GMT
- Title: How Effectively Can BERT Models Interpret Context and Detect Bengali Communal Violent Text?
- Authors: Abdullah Khondoker, Enam Ahmed Taufik, Md. Iftekhar Islam Tashik, S M Ishtiak Mahmud, Farig Sadeque,
- Abstract summary: Communal violent text remains an underexplored area in existing research.<n>We introduce a fine-tuned BanglaBERT model tailored for this task, achieving a macro F1 score of 0.60.<n>This ensemble model demonstrated an improved performance, achieving a macro F1 score of 0.63, thus highlighting its effectiveness in this domain.
- Score: 0.3029213689620348
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The spread of cyber hatred has led to communal violence, fueling aggression and conflicts between various religious, ethnic, and social groups, posing a significant threat to social harmony. Despite its critical importance, the classification of communal violent text remains an underexplored area in existing research. This study aims to enhance the accuracy of detecting text that incites communal violence, focusing specifically on Bengali textual data sourced from social media platforms. We introduce a fine-tuned BanglaBERT model tailored for this task, achieving a macro F1 score of 0.60. To address the issue of data imbalance, our dataset was expanded by adding 1,794 instances, which facilitated the development and evaluation of a fine-tuned ensemble model. This ensemble model demonstrated an improved performance, achieving a macro F1 score of 0.63, thus highlighting its effectiveness in this domain. In addition to quantitative performance metrics, qualitative analysis revealed instances where the models struggled with context understanding, leading to occasional misclassifications, even when predictions were made with high confidence. Through analyzing the cosine similarity between words, we identified certain limitations in the pre-trained BanglaBERT models, particularly in their ability to distinguish between closely related communal and non-communal terms. To further interpret the model's decisions, we applied LIME, which helped to uncover specific areas where the model struggled in understanding context, contributing to errors in classification. These findings highlight the promise of NLP and interpretability tools in reducing online communal violence. Our work contributes to the growing body of research in communal violence detection and offers a foundation for future studies aiming to refine these techniques for better accuracy and societal impact.
Related papers
- CL-ISR: A Contrastive Learning and Implicit Stance Reasoning Framework for Misleading Text Detection on Social Media [0.5999777817331317]
This paper proposes a novel framework - CL-ISR (Contrastive Learning and Implicit Stance Reasoning) to improve the detection accuracy of misleading texts on social media.<n>First, we use the contrastive learning algorithm to improve the model's learning ability of semantic differences between truthful and misleading texts.<n>Second, we introduce the implicit stance reasoning module, to explore the potential stance tendencies in the text and their relationships with related topics.
arXiv Detail & Related papers (2025-06-05T14:52:28Z) - Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions [49.546479320670464]
This paper introduces specialized metrics for benchmarking the spatial robustness of segmentation models.<n>We propose region-aware multi-attack adversarial analysis, a method that enables a deeper understanding of model robustness.<n>The results reveal that models respond to these two types of threats differently.
arXiv Detail & Related papers (2025-04-02T11:37:39Z) - Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives [52.863024096759816]
Misaligned research objectives have hindered progress in adversarial robustness research over the past decade.<n>We argue that realigned objectives are necessary for meaningful progress in adversarial alignment.
arXiv Detail & Related papers (2025-02-17T15:28:40Z) - Unpacking the Resilience of SNLI Contradiction Examples to Attacks [0.38366697175402226]
We apply the Universal Adversarial Attack to examine the model's vulnerabilities.<n>Our analysis revealed substantial drops in accuracy for the entailment and neutral classes.<n>Fine-tuning the model on an augmented dataset with adversarial examples restored its performance to near-baseline levels.
arXiv Detail & Related papers (2024-12-15T12:47:28Z) - Causal Micro-Narratives [62.47217054314046]
We present a novel approach to classify causal micro-narratives from text.
These narratives are sentence-level explanations of the cause(s) and/or effect(s) of a target subject.
arXiv Detail & Related papers (2024-10-07T17:55:10Z) - Prompting or Fine-tuning? Exploring Large Language Models for Causal Graph Validation [0.0]
This study explores the capability of Large Language Models to evaluate causality in causal graphs.<n>Our study compares two approaches: (1) prompting-based method for zero-shot and few-shot causal inference and, (2) fine-tuning language models for the causal relation prediction task.
arXiv Detail & Related papers (2024-05-29T09:06:18Z) - Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions [1.2618555186247333]
We have developed the first comprehensive framework for the automatic detection of communal violence markers in online Bangla content.
Our workflow introduces a 7-step expert annotation process incorporating insights from social scientists, linguists, and psychologists.
By presenting data statistics and benchmarking performance using this dataset, we have determined that, aside from the category of Non-communal violence, Religio-communal violence is particularly pervasive in Bangla text.
arXiv Detail & Related papers (2024-04-17T21:09:13Z) - Mavericks at BLP-2023 Task 1: Ensemble-based Approach Using Language
Models for Violence Inciting Text Detection [0.0]
Social media has accelerated the propagation of hate and violence-inciting speech in society.
The problem of detecting violence-inciting texts is further exacerbated in low-resource settings due to sparse research and less data.
This paper presents our work for the Violence Inciting Text Detection shared task in the First Workshop on Bangla Language Processing.
arXiv Detail & Related papers (2023-11-30T18:23:38Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - Decoding the Silent Majority: Inducing Belief Augmented Social Graph
with Large Language Model for Response Forecasting [74.68371461260946]
SocialSense is a framework that induces a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics.
Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings.
arXiv Detail & Related papers (2023-10-20T06:17:02Z) - CausalDialogue: Modeling Utterance-level Causality in Conversations [83.03604651485327]
We have compiled and expanded upon a new dataset called CausalDialogue through crowd-sourcing.
This dataset includes multiple cause-effect pairs within a directed acyclic graph (DAG) structure.
We propose a causality-enhanced method called Exponential Average Treatment Effect (ExMATE) to enhance the impact of causality at the utterance level in training neural conversation models.
arXiv Detail & Related papers (2022-12-20T18:31:50Z) - Counterfactual Reasoning for Out-of-distribution Multimodal Sentiment
Analysis [56.84237932819403]
This paper aims to estimate and mitigate the bad effect of textual modality for strong OOD generalization.
Inspired by this, we devise a model-agnostic counterfactual framework for multimodal sentiment analysis.
arXiv Detail & Related papers (2022-07-24T03:57:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.