Detecting Harmful Online Conversational Content towards LGBTQIA+
Individuals
- URL: http://arxiv.org/abs/2207.10032v1
- Date: Wed, 15 Jun 2022 20:14:02 GMT
- Title: Detecting Harmful Online Conversational Content towards LGBTQIA+
Individuals
- Authors: Jamell Dacon, Harry Shomer, Shaylynn Crum-Dacon, Jiliang Tang
- Abstract summary: This work introduces a real-world dataset that will enable us to study and understand harmful online conversational content.
We implement two baseline machine learning models and fine-tune 3 pre-trained large language models.
Our findings verify that large language models can achieve very promising performance on detecting online Anti-LGBTQIA+ conversational content detection tasks.
- Score: 30.03410762695714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online discussions, panels, talk page edits, etc., often contain harmful
conversational content i.e., hate speech, death threats and offensive language,
especially towards certain demographic groups. For example, individuals who
identify as members of the LGBTQIA+ community and/or BIPOC (Black, Indigenous,
People of Color) are at higher risk for abuse and harassment online. In this
work, we first introduce a real-world dataset that will enable us to study and
understand harmful online conversational content. Then, we conduct several
exploratory data analysis experiments to gain deeper insights from the dataset.
We later describe our approach for detecting harmful online Anti-LGBTQIA+
conversational content, and finally, we implement two baseline machine learning
models (i.e., Support Vector Machine and Logistic Regression), and fine-tune 3
pre-trained large language models (BERT, RoBERTa, and HateBERT). Our findings
verify that large language models can achieve very promising performance on
detecting online Anti-LGBTQIA+ conversational content detection tasks.
Related papers
- Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias [8.168722337906148]
This study investigates the presence of bias in harmful speech classification of gender-queer dialect online.
We introduce a novel dataset, QueerLex, based on 109 curated templates exemplifying non-derogatory uses of LGBTQ+ slurs.
We systematically evaluate the performance of five off-the-shelf language models in assessing the harm of these texts.
arXiv Detail & Related papers (2024-05-23T18:07:28Z) - CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models [59.22460740026037]
"CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset is designed to evaluate the social and cultural variation of Large Language Models (LLMs)
We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, social welfare, immigration, disability rights, and surrogacy.
arXiv Detail & Related papers (2024-05-22T20:19:10Z) - Bridging the gap in online hate speech detection: a comparative analysis of BERT and traditional models for homophobic content identification on X/Twitter [0.7366405857677227]
We develop a nuanced approach to identify homophobic content on X/Twitter.
This research is pivotal due to the persistent underrepresentation of homophobia in detection models.
By releasing the largest open-source labelled English dataset for homophobia detection known to us, we aim to enhance online safety and inclusivity.
arXiv Detail & Related papers (2024-05-15T10:02:47Z) - What Evidence Do Language Models Find Convincing? [94.90663008214918]
We build a dataset that pairs controversial queries with a series of real-world evidence documents that contain different facts.
We use this dataset to perform sensitivity and counterfactual analyses to explore which text features most affect LLM predictions.
Overall, we find that current models rely heavily on the relevance of a website to the query, while largely ignoring stylistic features that humans find important.
arXiv Detail & Related papers (2024-02-19T02:15:34Z) - Explain Thyself Bully: Sentiment Aided Cyberbullying Detection with
Explanation [52.3781496277104]
Cyberbullying has become a big issue with the popularity of different social media networks and online communication apps.
Recent laws like "right to explanations" of General Data Protection Regulation have spurred research in developing interpretable models.
We develop first interpretable multi-task model called em mExCB for automatic cyberbullying detection from code-mixed languages.
arXiv Detail & Related papers (2024-01-17T07:36:22Z) - The Uli Dataset: An Exercise in Experience Led Annotation of oGBV [3.1060730586569427]
We present a dataset on gendered abuse in three languages- Hindi, Tamil and Indian English.
The dataset comprises of tweets annotated along three questions pertaining to the experience of gender abuse, by experts who identify as women or a member of the LGBTQIA community in South Asia.
arXiv Detail & Related papers (2023-11-15T16:30:44Z) - Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual
Predatory Chats and Abusive Texts [2.406214748890827]
This paper proposes an approach to detection of online sexual predatory chats and abusive language using the open-source pretrained Llama 2 7B- parameter model.
We fine-tune the LLM using datasets with different sizes, imbalance degrees, and languages (i.e., English, Roman Urdu and Urdu)
Experimental results show a strong performance of the proposed approach, which performs proficiently and consistently across three distinct datasets.
arXiv Detail & Related papers (2023-08-28T16:18:50Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - COLD: A Benchmark for Chinese Offensive Language Detection [54.60909500459201]
We use COLDataset, a Chinese offensive language dataset with 37k annotated sentences.
We also propose textscCOLDetector to study output offensiveness of popular Chinese language models.
Our resources and analyses are intended to help detoxify the Chinese online communities and evaluate the safety performance of generative language models.
arXiv Detail & Related papers (2022-01-16T11:47:23Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Transfer Learning for Hate Speech Detection in Social Media [14.759208309842178]
This paper uses a transfer learning technique to leverage two independent datasets jointly.
We build an interpretable two-dimensional visualization tool of the constructed hate speech representation -- dubbed the Map of Hate.
We show that the joint representation boosts prediction performances when only a limited amount of supervision is available.
arXiv Detail & Related papers (2019-06-10T08:00:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.