Incorporating Human Explanations for Robust Hate Speech Detection
- URL: http://arxiv.org/abs/2411.06213v1
- Date: Sat, 09 Nov 2024 15:29:04 GMT
- Title: Incorporating Human Explanations for Robust Hate Speech Detection
- Authors: Jennifer L. Chen, Faisal Ladhak, Daniel Li, NoƩmie Elhadad,
- Abstract summary: We develop a three stage analysis to evaluate if LMs faithfully assess hate speech.
First, we observe the need for modeling contextually grounded stereotype intents to capture implicit semantic meaning.
Next, we design a new task, Stereotype Intent Entailment (SIE), which encourages a model to contextually understand stereotype presence.
- Score: 17.354241456219945
- License:
- Abstract: Given the black-box nature and complexity of large transformer language models (LM), concerns about generalizability and robustness present ethical implications for domains such as hate speech (HS) detection. Using the content rich Social Bias Frames dataset, containing human-annotated stereotypes, intent, and targeted groups, we develop a three stage analysis to evaluate if LMs faithfully assess hate speech. First, we observe the need for modeling contextually grounded stereotype intents to capture implicit semantic meaning. Next, we design a new task, Stereotype Intent Entailment (SIE), which encourages a model to contextually understand stereotype presence. Finally, through ablation tests and user studies, we find a SIE objective improves content understanding, but challenges remain in modeling implicit intent.
Related papers
- HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection [0.0]
We introduce HEARTS (Holistic Framework for Explainable, Sustainable, and Robust Text Stereotype Detection), a framework that enhances model performance, minimises carbon footprint, and provides transparent, interpretable explanations.
We establish the Expanded Multi-Grain Stereotype dataset (EMGSD), comprising 57,201 labelled texts across six groups, including under-represented demographics like LGBTQ+ and regional stereotypes.
We then analyse a fine-tuned, carbon-efficient ALBERT-V2 model using SHAP to generate token-level importance values, ensuring alignment with human understanding, and calculate explainability confidence scores by comparing SHAP and
arXiv Detail & Related papers (2024-09-17T22:06:46Z) - Integrating Self-supervised Speech Model with Pseudo Word-level Targets
from Visually-grounded Speech Model [57.78191634042409]
We propose Pseudo-Word HuBERT (PW-HuBERT), a framework that integrates pseudo word-level targets into the training process.
Our experimental results on four spoken language understanding (SLU) benchmarks suggest the superiority of our model in capturing semantic information.
arXiv Detail & Related papers (2024-02-08T16:55:21Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - SOUL: Towards Sentiment and Opinion Understanding of Language [96.74878032417054]
We propose a new task called Sentiment and Opinion Understanding of Language (SOUL)
SOUL aims to evaluate sentiment understanding through two subtasks: Review (RC) and Justification Generation (JG)
arXiv Detail & Related papers (2023-10-27T06:48:48Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - On The Role of Reasoning in the Identification of Subtle Stereotypes in Natural Language [0.03749861135832073]
Large language models (LLMs) are trained on vast, uncurated datasets that contain various forms of biases and language reinforcing harmful stereotypes.
It is essential to examine and address biases in language models, integrating fairness into their development to ensure that these models do not perpetuate social biases.
This work firmly establishes reasoning as a critical component in automatic stereotype detection and is a first step towards stronger stereotype mitigation pipelines for LLMs.
arXiv Detail & Related papers (2023-07-24T15:12:13Z) - Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - Counteracts: Testing Stereotypical Representation in Pre-trained
Language Models [4.211128681972148]
We use counterexamples to examine the internal stereotypical knowledge in pre-trained language models (PLMs)
We evaluate 7 PLMs on 9 types of cloze-style prompt with different information and base knowledge.
arXiv Detail & Related papers (2023-01-11T07:52:59Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - CO-STAR: Conceptualisation of Stereotypes for Analysis and Reasoning [0.0]
We build on existing literature and present CO-STAR, a novel framework which encodes the underlying concepts of implied stereotypes.
We also introduce the CO-STAR training data set, which contains just over 12K structured annotations of implied stereotypes and stereotype conceptualisations.
The CO-STAR models are, however, limited in their ability to understand more complex and subtly worded stereotypes.
arXiv Detail & Related papers (2021-12-01T20:39:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.