NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps
- URL: http://arxiv.org/abs/2404.01651v1
- Date: Tue, 2 Apr 2024 05:36:41 GMT
- Title: NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps
- Authors: Kristina Gligoric, Myra Cheng, Lucia Zheng, Esin Durmus, Dan Jurafsky,
- Abstract summary: Counterspeech that refutes problematic content often mentions harmful language but is not harmful itself.
We show that even recent language models fail at distinguishing use from mention.
This failure propagates to two key downstream tasks: misinformation and hate speech detection.
- Score: 43.40965978436158
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The use of words to convey speaker's intent is traditionally distinguished from the `mention' of words for quoting what someone said, or pointing out properties of a word. Here we show that computationally modeling this use-mention distinction is crucial for dealing with counterspeech online. Counterspeech that refutes problematic content often mentions harmful language but is not harmful itself (e.g., calling a vaccine dangerous is not the same as expressing disapproval of someone for calling vaccines dangerous). We show that even recent language models fail at distinguishing use from mention, and that this failure propagates to two key downstream tasks: misinformation and hate speech detection, resulting in censorship of counterspeech. We introduce prompting mitigations that teach the use-mention distinction, and show they reduce these errors. Our work highlights the importance of the use-mention distinction for NLP and CSS and offers ways to address it.
Related papers
- Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - DisCGen: A Framework for Discourse-Informed Counterspeech Generation [34.75404551612012]
We propose a framework based on theories of discourse to study the inferential links that connect counter speeches to hateful comments.
We present a process for collecting an in-the-wild dataset of counterspeech from Reddit.
We show that by using our dataset and framework, large language models can generate contextually-grounded counterspeech informed by theories of discourse.
arXiv Detail & Related papers (2023-11-29T23:20:17Z) - HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning [29.519687405350304]
We introduce a hate speech detection framework, HARE, which harnesses the reasoning capabilities of large language models (LLMs) to fill gaps in explanations of hate speech.
Experiments on SBIC and Implicit Hate benchmarks show that our method, using model-generated data, consistently outperforms baselines.
Our method enhances the explanation quality of trained models and improves generalization to unseen datasets.
arXiv Detail & Related papers (2023-11-01T06:09:54Z) - Adversarial Training For Low-Resource Disfluency Correction [50.51901599433536]
We propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC)
We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages.
Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments.
arXiv Detail & Related papers (2023-06-10T08:58:53Z) - DisfluencyFixer: A tool to enhance Language Learning through Speech To
Speech Disfluency Correction [50.51901599433536]
DisfluencyFixer is a tool that performs speech-to-speech disfluency correction in English and Hindi.
Our proposed system removes disfluencies from input speech and returns fluent speech as output.
arXiv Detail & Related papers (2023-05-26T14:13:38Z) - Leveraging World Knowledge in Implicit Hate Speech Detection [5.5536024561229205]
We show that real world knowledge about entity mentions in a text does help models better detect hate speech.
We also discuss cases where real world knowledge does not add value to hate speech detection.
arXiv Detail & Related papers (2022-12-28T21:23:55Z) - Hate Speech and Counter Speech Detection: Conversational Context Does
Matter [7.333666276087548]
This paper investigates the role of conversational context in the annotation and detection of online hate and counter speech.
We created a context-aware dataset for a 3-way classification task on Reddit comments: hate speech, counter speech, or neutral.
arXiv Detail & Related papers (2022-06-13T19:05:44Z) - Improving Self-Supervised Speech Representations by Disentangling
Speakers [56.486084431528695]
Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus.
Disentangling speakers is very challenging, because removing the speaker information could easily result in a loss of content as well.
We propose a new SSL method that can achieve speaker disentanglement without severe loss of content.
arXiv Detail & Related papers (2022-04-20T04:56:14Z) - Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable
Topics for the Russian Language [76.58220021791955]
We present two text collections labelled according to binary notion of inapropriateness and a multinomial notion of sensitive topic.
To objectivise the notion of inappropriateness, we define it in a data-driven way though crowdsourcing.
arXiv Detail & Related papers (2022-03-04T15:59:06Z) - Countering Online Hate Speech: An NLP Perspective [34.19875714256597]
Online toxicity - an umbrella term for online hateful behavior - manifests itself in forms such as online hate speech.
The rising mass communication through social media further exacerbates the harmful consequences of online hate speech.
This paper presents a holistic conceptual framework on hate-speech NLP countering methods along with a thorough survey on the current progress of NLP for countering online hate speech.
arXiv Detail & Related papers (2021-09-07T08:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.