Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions
- URL: http://arxiv.org/abs/2404.11752v1
- Date: Wed, 17 Apr 2024 21:09:13 GMT
- Title: Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions
- Authors: Nazia Tasnim, Sujan Sen Gupta, Md. Istiak Hossain Shihab, Fatiha Islam Juee, Arunima Tahsin, Pritom Ghum, Kanij Fatema, Marshia Haque, Wasema Farzana, Prionti Nasir, Ashique KhudaBukhsh, Farig Sadeque, Asif Sushmit,
- Abstract summary: We have developed the first comprehensive framework for the automatic detection of communal violence markers in online Bangla content.
Our workflow introduces a 7-step expert annotation process incorporating insights from social scientists, linguists, and psychologists.
By presenting data statistics and benchmarking performance using this dataset, we have determined that, aside from the category of Non-communal violence, Religio-communal violence is particularly pervasive in Bangla text.
- Score: 1.2618555186247333
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Communal violence in online forums has become extremely prevalent in South Asia, where many communities of different cultures coexist and share resources. These societies exhibit a phenomenon characterized by strong bonds within their own groups and animosity towards others, leading to conflicts that frequently escalate into violent confrontations. To address this issue, we have developed the first comprehensive framework for the automatic detection of communal violence markers in online Bangla content accompanying the largest collection (13K raw sentences) of social media interactions that fall under the definition of four major violence class and their 16 coarse expressions. Our workflow introduces a 7-step expert annotation process incorporating insights from social scientists, linguists, and psychologists. By presenting data statistics and benchmarking performance using this dataset, we have determined that, aside from the category of Non-communal violence, Religio-communal violence is particularly pervasive in Bangla text. Moreover, we have substantiated the effectiveness of fine-tuning language models in identifying violent comments by conducting preliminary benchmarking on the state-of-the-art Bangla deep learning model.
Related papers
- Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues [66.69453609603875]
Sociocultural norms serve as guiding principles for personal conduct in social interactions.
We propose a scalable approach for constructing a Sociocultural Norm (SCN) Base using Large Language Models (LLMs)
We construct a comprehensive and publicly accessible Chinese Sociocultural NormBase.
arXiv Detail & Related papers (2024-10-04T00:08:46Z) - Assessing the Level of Toxicity Against Distinct Groups in Bangla Social Media Comments: A Comprehensive Investigation [0.0]
This study focuses on identifying toxic comments in the Bengali language targeting three specific groups: transgender people, indigenous people, and migrant people.
The methodology involves creating a dataset, manual annotation, and employing pre-trained transformer models like Bangla-BERT, bangla-bert-base, distil-BERT, and Bert-base-multilingual-cased for classification.
The experimental findings reveal that Bangla-BERT surpasses alternative models, achieving an F1-score of 0.8903.
arXiv Detail & Related papers (2024-09-25T17:48:59Z) - Fear and Loathing on the Frontline: Decoding the Language of Othering by Russia-Ukraine War Bloggers [6.632254395574994]
Othering, the act of portraying outgroups as fundamentally different from the ingroup, often escalates into framing them as existential threats.
These dynamics are alarmingly pervasive, spanning from the extreme historical examples of genocides against minorities in Germany and Rwanda to the ongoing violence and rhetoric targeting migrants in the US and Europe.
Our framework, designed to offer deeper insights into othering dynamics, combines with a rapid adaptation process to provide essential tools for mitigating othering's adverse impacts on social cohesion.
arXiv Detail & Related papers (2024-09-19T19:56:03Z) - Exploring Boundaries and Intensities in Offensive and Hate Speech: Unveiling the Complex Spectrum of Social Media Discourse [16.99659597567309]
We present an extensive benchmark dataset for Amharic, comprising 8,258 tweets annotated for three distinct tasks.
Our study highlights that a considerable majority of tweets belong to the less offensive and less hate intensity levels.
The prevalence of ethnic and political hatred targets, with significant overlaps in our dataset, emphasizes the complex relationships within Ethiopia's sociopolitical landscape.
arXiv Detail & Related papers (2024-04-18T09:52:50Z) - Into the crossfire: evaluating the use of a language model to
crowdsource gun violence reports [0.21485350418225244]
We propose a fine-tuned BERT-based model trained on Twitter texts to distinguish gun violence reports from ordinary Portuguese texts.
We study and interview Brazilian analysts who continuously fact-check social media texts to identify new gun violence events.
arXiv Detail & Related papers (2024-01-16T14:40:54Z) - Mavericks at BLP-2023 Task 1: Ensemble-based Approach Using Language
Models for Violence Inciting Text Detection [0.0]
Social media has accelerated the propagation of hate and violence-inciting speech in society.
The problem of detecting violence-inciting texts is further exacerbated in low-resource settings due to sparse research and less data.
This paper presents our work for the Violence Inciting Text Detection shared task in the First Workshop on Bangla Language Processing.
arXiv Detail & Related papers (2023-11-30T18:23:38Z) - Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B.
We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively.
We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z) - Decoding the Silent Majority: Inducing Belief Augmented Social Graph
with Large Language Model for Response Forecasting [74.68371461260946]
SocialSense is a framework that induces a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics.
Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings.
arXiv Detail & Related papers (2023-10-20T06:17:02Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and
Benchmarks [95.29345070102045]
In this paper, we focus our investigation on social bias detection of dialog safety problems.
We first propose a novel Dial-Bias Frame for analyzing the social bias in conversations pragmatically.
We introduce CDail-Bias dataset that is the first well-annotated Chinese social bias dialog dataset.
arXiv Detail & Related papers (2022-02-16T11:59:29Z) - Fragments of the Past: Curating Peer Support with Perpetrators of
Domestic Violence [88.37416552778178]
We report on a ten-month study where we worked with six support workers and eighteen perpetrators in the design and deployment of Fragments of the Past.
We share how crafting digitally-augmented artefacts - 'fragments' - of experiences of desisting from violence can translate messages for motivation and rapport between peers.
These insights provide the basis for practical considerations for future network design with challenging populations.
arXiv Detail & Related papers (2021-07-09T22:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.