Textual Toxicity in Social Media: Understanding the Bangla Toxic
Language Expressed in Facebook Comment
- URL: http://arxiv.org/abs/2312.05467v1
- Date: Sat, 9 Dec 2023 05:04:34 GMT
- Title: Textual Toxicity in Social Media: Understanding the Bangla Toxic
Language Expressed in Facebook Comment
- Authors: Mohammad Mamun Or Rashid
- Abstract summary: The toxic language/script used by the Bengali community as cyberbullying, hate speech and moral policing became major trends in social media culture in Bangladesh and West Bengal.
It is assumed that this analysis will reinforce the detection of Bangla's toxic language used in social media and thus cure this virtual disease.
- Score: 0.6798775532273751
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Social Media is a repository of digital literature including user-generated
content. The users of social media are expressing their opinion with diverse
mediums such as text, emojis, memes, and also through other visual and textual
mediums. A major portion of these media elements could be treated as harmful to
others and they are known by many words including Cyberbullying and Toxic
Language . The goal of this research paper is to analyze a curated and
value-added dataset of toxic language titled ToxLex_bn . It is an exhaustive
wordlist that can be used as classifier material to detect toxicity in social
media. The toxic language/script used by the Bengali community as
cyberbullying, hate speech and moral policing became major trends in social
media culture in Bangladesh and West Bengal. The toxicity became so high that
the victims has to post as a counter or release explanation video for the
haters. Most cases are pointed to women celebrity and their relation, dress,
lifestyle are became trolled and toxicity flooded in comments boxes. Not only
celebrity bashing but also hates occurred between Hindu Muslims,
India-Bangladesh, Two opponents of 1971 and these are very common for virtual
conflict in the comment thread. Even many times facebook comment causes sue and
legal matters in Bangladesh and thus it requires more study. In this study, a
Bangla toxic language dataset has been analyzed which was inputted by the user
in Bengali script & language. For this, about 1968 unique bigrams or phrases as
wordlists have been analyzed which are derived from 2207590 comments. It is
assumed that this analysis will reinforce the detection of Bangla's toxic
language used in social media and thus cure this virtual disease.
Related papers
- Analyzing Toxicity in Deep Conversations: A Reddit Case Study [0.0]
This work employs a tree-based approach to understand how users behave concerning toxicity in public conversation settings.
We collect both the posts and the comment sections of the top 100 posts from 8 Reddit communities that allow profanity, totaling over 1 million responses.
We find that toxic comments increase the likelihood of subsequent toxic comments being produced in online conversations.
arXiv Detail & Related papers (2024-04-11T16:10:44Z) - An Image is Worth a Thousand Toxic Words: A Metamorphic Testing
Framework for Content Moderation Software [64.367830425115]
Social media platforms are being increasingly misused to spread toxic content, including hate speech, malicious advertising, and pornography.
Despite tremendous efforts in developing and deploying content moderation methods, malicious users can evade moderation by embedding texts into images.
We propose a metamorphic testing framework for content moderation software.
arXiv Detail & Related papers (2023-08-18T20:33:06Z) - Analyzing Norm Violations in Live-Stream Chat [49.120561596550395]
We study the first NLP study dedicated to detecting norm violations in conversations on live-streaming platforms.
We define norm violation categories in live-stream chats and annotate 4,583 moderated comments from Twitch.
Our results show that appropriate contextual information can boost moderation performance by 35%.
arXiv Detail & Related papers (2023-05-18T05:58:27Z) - Classification of social media Toxic comments using Machine learning
models [0.0]
The abstract outlines the problem of toxic comments on social media platforms, where individuals use disrespectful, abusive, and unreasonable language.
This behavior is referred to as anti-social behavior, which occurs during online debates, comments, and fights.
The comments containing explicit language can be classified into various categories, such as toxic, severe toxic, obscene, threat, insult, and identity hate.
To protect users from offensive language, companies have started flagging comments and blocking users.
arXiv Detail & Related papers (2023-04-14T05:40:11Z) - Automated Sentiment and Hate Speech Analysis of Facebook Data by
Employing Multilingual Transformer Models [15.823923425516078]
We analyse the statistical distribution of hateful and negative sentiment contents within a representative Facebook dataset.
We employ state-of-the-art, open-source XLM-T multilingual transformer-based language models to perform sentiment and hate speech analysis.
arXiv Detail & Related papers (2023-01-31T14:37:04Z) - Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable
Topics for the Russian Language [76.58220021791955]
We present two text collections labelled according to binary notion of inapropriateness and a multinomial notion of sensitive topic.
To objectivise the notion of inappropriateness, we define it in a data-driven way though crowdsourcing.
arXiv Detail & Related papers (2022-03-04T15:59:06Z) - Annotators with Attitudes: How Annotator Beliefs And Identities Bias
Toxic Language Detection [75.54119209776894]
We investigate the effect of annotator identities (who) and beliefs (why) on toxic language annotations.
We consider posts with three characteristics: anti-Black language, African American English dialect, and vulgarity.
Our results show strong associations between annotator identity and beliefs and their ratings of toxicity.
arXiv Detail & Related papers (2021-11-15T18:58:20Z) - Detecting Harmful Content On Online Platforms: What Platforms Need Vs.
Where Research Efforts Go [44.774035806004214]
harmful content on online platforms comes in many different forms including hate speech, offensive language, bullying and harassment, misinformation, spam, violence, graphic content, sexual abuse, self harm, and many other.
Online platforms seek to moderate such content to limit societal harm, to comply with legislation, and to create a more inclusive environment for their users.
There is currently a dichotomy between what types of harmful content online platforms seek to curb, and what research efforts there are to automatically detect such content.
arXiv Detail & Related papers (2021-02-27T08:01:10Z) - Bangla Text Dataset and Exploratory Analysis for Online Harassment
Detection [0.0]
The data that has been made accessible in this article has been gathered and marked from the comments of people in public posts by celebrities, government officials, athletes on Facebook.
The dataset is compiled with the aim of developing the ability of machines to differentiate whether a comment is a bully expression or not.
arXiv Detail & Related papers (2021-02-04T08:35:18Z) - Hate Speech detection in the Bengali language: A dataset and its
baseline evaluation [0.8793721044482612]
This paper presents a new dataset of 30,000 user comments tagged by crowd sourcing and varified by experts.
All the comments are collected from YouTube and Facebook comment section and classified into seven categories.
A total of 50 annotators annotated each comment three times and the majority vote was taken as the final annotation.
arXiv Detail & Related papers (2020-12-17T15:53:54Z) - Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media
during the COVID-19 Crisis [51.39895377836919]
COVID-19 has sparked racism and hate on social media targeted towards Asian communities.
We study the evolution and spread of anti-Asian hate speech through the lens of Twitter.
We create COVID-HATE, the largest dataset of anti-Asian hate and counterspeech spanning 14 months.
arXiv Detail & Related papers (2020-05-25T21:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.