Understanding Lexical Biases when Identifying Gang-related Social Media
Communications
- URL: http://arxiv.org/abs/2304.11485v1
- Date: Sat, 22 Apr 2023 21:51:49 GMT
- Title: Understanding Lexical Biases when Identifying Gang-related Social Media
Communications
- Authors: Dhiraj Murthy, Constantine Caramanis, Koustav Rudra
- Abstract summary: We use a binary logistic classifier to identify gang-related tweets in Chicago.
We find that the language of a tweet is highly relevant and that uses of big data'' methods or machine learning models need to better understand how language impacts the model's performance.
- Score: 18.301221486244263
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Individuals involved in gang-related activity use mainstream social media
including Facebook and Twitter to express taunts and threats as well as grief
and memorializing. However, identifying the impact of gang-related activity in
order to serve community member needs through social media sources has a unique
set of challenges. This includes the difficulty of ethically identifying
training data of individuals impacted by gang activity and the need to account
for a non-standard language style commonly used in the tweets from these
individuals. Our study provides evidence of methods where natural language
processing tools can be helpful in efficiently identifying individuals who may
be in need of community care resources such as counselors, conflict mediators,
or academic/professional training programs. We demonstrate that our binary
logistic classifier outperforms baseline standards in identifying individuals
impacted by gang-related violence using a sample of gang-related tweets
associated with Chicago. We ultimately found that the language of a tweet is
highly relevant and that uses of ``big data'' methods or machine learning
models need to better understand how language impacts the model's performance
and how it discriminates among populations.
Related papers
- Assessing the Level of Toxicity Against Distinct Groups in Bangla Social Media Comments: A Comprehensive Investigation [0.0]
This study focuses on identifying toxic comments in the Bengali language targeting three specific groups: transgender people, indigenous people, and migrant people.
The methodology involves creating a dataset, manual annotation, and employing pre-trained transformer models like Bangla-BERT, bangla-bert-base, distil-BERT, and Bert-base-multilingual-cased for classification.
The experimental findings reveal that Bangla-BERT surpasses alternative models, achieving an F1-score of 0.8903.
arXiv Detail & Related papers (2024-09-25T17:48:59Z) - Unveiling Social Media Comments with a Novel Named Entity Recognition System for Identity Groups [2.5849042763002426]
We develop a Named Entity Recognition (NER) System for Identity Groups.
Our tool not only detects whether a sentence contains an attack but also tags the sentence tokens corresponding to the mentioned group.
We tested the utility of our tool in a case study on social media, annotating and comparing comments from Facebook related to news mentioning identity groups.
arXiv Detail & Related papers (2024-05-13T19:33:18Z) - MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection [2.433983268807517]
Hate speech poses significant social, psychological, and occasionally physical threats to targeted individuals and communities.
Current computational linguistic approaches for tackling this phenomenon rely on labelled social media datasets for training.
We scrutinized over 60 datasets, selectively integrating those pertinent into MetaHate.
Our findings contribute to a deeper understanding of the existing datasets, paving the way for training more robust and adaptable models.
arXiv Detail & Related papers (2024-01-12T11:54:53Z) - Developing Linguistic Patterns to Mitigate Inherent Human Bias in
Offensive Language Detection [1.6574413179773761]
We propose a linguistic data augmentation approach to reduce bias in labeling processes.
This approach has the potential to improve offensive language classification tasks across multiple languages.
arXiv Detail & Related papers (2023-12-04T10:20:36Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Rumor Detection with Self-supervised Learning on Texts and Social Graph [101.94546286960642]
We propose contrastive self-supervised learning on heterogeneous information sources, so as to reveal their relations and characterize rumors better.
We term this framework as Self-supervised Rumor Detection (SRD)
Extensive experiments on three real-world datasets validate the effectiveness of SRD for automatic rumor detection on social media.
arXiv Detail & Related papers (2022-04-19T12:10:03Z) - Fragments of the Past: Curating Peer Support with Perpetrators of
Domestic Violence [88.37416552778178]
We report on a ten-month study where we worked with six support workers and eighteen perpetrators in the design and deployment of Fragments of the Past.
We share how crafting digitally-augmented artefacts - 'fragments' - of experiences of desisting from violence can translate messages for motivation and rapport between peers.
These insights provide the basis for practical considerations for future network design with challenging populations.
arXiv Detail & Related papers (2021-07-09T22:57:43Z) - Learning Language and Multimodal Privacy-Preserving Markers of Mood from
Mobile Data [74.60507696087966]
Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care.
One promising data source to help monitor human behavior is daily smartphone usage.
We study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors.
arXiv Detail & Related papers (2021-06-24T17:46:03Z) - Can You be More Social? Injecting Politeness and Positivity into
Task-Oriented Conversational Agents [60.27066549589362]
Social language used by human agents is associated with greater users' responsiveness and task completion.
The model uses a sequence-to-sequence deep learning architecture, extended with a social language understanding element.
Evaluation in terms of content preservation and social language level using both human judgment and automatic linguistic measures shows that the model can generate responses that enable agents to address users' issues in a more socially appropriate way.
arXiv Detail & Related papers (2020-12-29T08:22:48Z) - Analysing Social Media Network Data with R: Semi-Automated Screening of
Users, Comments and Communication Patterns [0.0]
Communication on social media platforms is increasingly widespread across societies.
Fake news, hate speech and radicalizing elements are part of this modern form of communication.
A basic understanding of these mechanisms and communication patterns could help to counteract negative forms of communication.
arXiv Detail & Related papers (2020-11-26T14:52:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.