Unveiling Social Media Comments with a Novel Named Entity Recognition System for Identity Groups
- URL: http://arxiv.org/abs/2405.13011v1
- Date: Mon, 13 May 2024 19:33:18 GMT
- Title: Unveiling Social Media Comments with a Novel Named Entity Recognition System for Identity Groups
- Authors: Andrés Carvallo, Tamara Quiroga, Carlos Aspillaga, Marcelo Mendoza,
- Abstract summary: We develop a Named Entity Recognition (NER) System for Identity Groups.
Our tool not only detects whether a sentence contains an attack but also tags the sentence tokens corresponding to the mentioned group.
We tested the utility of our tool in a case study on social media, annotating and comparing comments from Facebook related to news mentioning identity groups.
- Score: 2.5849042763002426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While civilized users employ social media to stay informed and discuss daily occurrences, haters perceive these platforms as fertile ground for attacking groups and individuals. The prevailing approach to counter this phenomenon involves detecting such attacks by identifying toxic language. Effective platform measures aim to report haters and block their network access. In this context, employing hate speech detection methods aids in identifying these attacks amidst vast volumes of text, which are impossible for humans to analyze manually. In our study, we expand upon the usual hate speech detection methods, typically based on text classifiers, to develop a Named Entity Recognition (NER) System for Identity Groups. To achieve this, we created a dataset that allows extending a conventional NER to recognize identity groups. Consequently, our tool not only detects whether a sentence contains an attack but also tags the sentence tokens corresponding to the mentioned group. Results indicate that the model performs competitively in identifying groups with an average f1-score of 0.75, outperforming in identifying ethnicity attack spans with an f1-score of 0.80 compared to other identity groups. Moreover, the tool shows an outstanding generalization capability to minority classes concerning sexual orientation and gender, achieving an f1-score of 0.77 and 0.72, respectively. We tested the utility of our tool in a case study on social media, annotating and comparing comments from Facebook related to news mentioning identity groups. The case study reveals differences in the types of attacks recorded, effectively detecting named entities related to the categories of the analyzed news articles. Entities are accurately tagged within their categories, with a negligible error rate for inter-category tagging.
Related papers
- A Target-Aware Analysis of Data Augmentation for Hate Speech Detection [3.858155067958448]
Hate speech is one of the main threats posed by the widespread use of social networks.
We investigate the possibility of augmenting existing data with generative language models, reducing target imbalance.
For some hate categories such as origin, religion, and disability, hate speech classification using augmented data for training improves by more than 10% F1 over the no augmentation baseline.
arXiv Detail & Related papers (2024-10-10T15:46:27Z) - PRAT: PRofiling Adversarial aTtacks [52.693011665938734]
We introduce a novel problem of PRofiling Adversarial aTtacks (PRAT)
Given an adversarial example, the objective of PRAT is to identify the attack used to generate it.
We use AID to devise a novel framework for the PRAT objective.
arXiv Detail & Related papers (2023-09-20T07:42:51Z) - How Hate Speech Varies by Target Identity: A Computational Analysis [5.746505534720595]
We investigate how hate speech varies in systematic ways according to the identities it targets.
We find that the targeted demographic category appears to have a greater effect on the language of hate speech than does the relative social power of the targeted identity group.
arXiv Detail & Related papers (2022-10-19T19:06:23Z) - Identifying Adversarial Attacks on Text Classifiers [32.958568467774704]
In this paper, we analyze adversarial text to determine which methods were used to create it.
Our first contribution is an extensive dataset for attack detection and labeling.
As our second contribution, we use this dataset to develop and benchmark a number of classifiers for attack identification.
arXiv Detail & Related papers (2022-01-21T06:16:04Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Reducing Target Group Bias in Hate Speech Detectors [56.94616390740415]
We show that text classification models trained on large publicly available datasets, may significantly under-perform on several protected groups.
We propose to perform token-level hate sense disambiguation, and utilize tokens' hate sense representations for detection.
arXiv Detail & Related papers (2021-12-07T17:49:34Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Detecting Abusive Albanian [5.092028049119383]
scShaj is an annotated dataset for hate speech and offensive speech constructed from user-text content on various social media platforms.
The dataset is tested using three different classification models, the best of which achieves an F1 score of 0.77 for the identification of offensive language.
arXiv Detail & Related papers (2021-07-28T18:47:32Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Mitigating Face Recognition Bias via Group Adaptive Classifier [53.15616844833305]
This work aims to learn a fair face representation, where faces of every group could be more equally represented.
Our work is able to mitigate face recognition bias across demographic groups while maintaining the competitive accuracy.
arXiv Detail & Related papers (2020-06-13T06:43:37Z) - Contextualizing Hate Speech Classifiers with Post-hoc Explanation [26.044033793878683]
Hate speech classifiers struggle to determine if group identifiers like "gay" or "black" are used in offensive or prejudiced ways.
We propose a novel regularization technique based on these explanations that encourages models to learn from the context.
Our approach improved over baselines in limiting false positives on out-of-domain data.
arXiv Detail & Related papers (2020-05-05T18:56:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.