Related papers: NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps

NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps

URL: http://arxiv.org/abs/2404.01651v1
Date: Tue, 2 Apr 2024 05:36:41 GMT
Title: NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps
Authors: Kristina Gligoric, Myra Cheng, Lucia Zheng, Esin Durmus, Dan Jurafsky,
Abstract summary: Counterspeech that refutes problematic content often mentions harmful language but is not harmful itself. We show that even recent language models fail at distinguishing use from mention. This failure propagates to two key downstream tasks: misinformation and hate speech detection.
Score: 43.40965978436158
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The use of words to convey speaker's intent is traditionally distinguished from the `mention' of words for quoting what someone said, or pointing out properties of a word. Here we show that computationally modeling this use-mention distinction is crucial for dealing with counterspeech online. Counterspeech that refutes problematic content often mentions harmful language but is not harmful itself (e.g., calling a vaccine dangerous is not the same as expressing disapproval of someone for calling vaccines dangerous). We show that even recent language models fail at distinguishing use from mention, and that this failure propagates to two key downstream tasks: misinformation and hate speech detection, resulting in censorship of counterspeech. We introduce prompting mitigations that teach the use-mention distinction, and show they reduce these errors. Our work highlights the importance of the use-mention distinction for NLP and CSS and offers ways to address it.

Related papers

Dealing with Annotator Disagreement in Hate Speech Classification [0.0]
This paper examines strategies for addressing annotator disagreement, an issue that has been largely overlooked. We evaluate different approaches to deal with annotator disagreement regarding hate speech classification in Turkish tweets, based on a fine-tuned BERT model. Our work highlights the importance of the problem and provides state-of-art benchmark results for detection and understanding of hate speech in online discourse.
arXiv Detail & Related papers (2025-02-12T10:19:50Z)
Generative AI may backfire for counterspeech [20.57872238271025]
We analyze whether contextualized counterspeech generated by state-of-the-art AI is effective in curbing online hate speech. We find that non-contextualized counterspeech employing a warning-of-consequence strategy significantly reduces online hate speech. However, contextualized counterspeech generated by LLMs proves ineffective and may even backfire.
arXiv Detail & Related papers (2024-11-22T14:47:00Z)
Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems. We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems. We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z)
DisCGen: A Framework for Discourse-Informed Counterspeech Generation [34.75404551612012]
We propose a framework based on theories of discourse to study the inferential links that connect counter speeches to hateful comments. We present a process for collecting an in-the-wild dataset of counterspeech from Reddit. We show that by using our dataset and framework, large language models can generate contextually-grounded counterspeech informed by theories of discourse.
arXiv Detail & Related papers (2023-11-29T23:20:17Z)
HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning [29.519687405350304]
We introduce a hate speech detection framework, HARE, which harnesses the reasoning capabilities of large language models (LLMs) to fill gaps in explanations of hate speech. Experiments on SBIC and Implicit Hate benchmarks show that our method, using model-generated data, consistently outperforms baselines. Our method enhances the explanation quality of trained models and improves generalization to unseen datasets.
arXiv Detail & Related papers (2023-11-01T06:09:54Z)
Adversarial Training For Low-Resource Disfluency Correction [50.51901599433536]
We propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC) We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages. Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments.
arXiv Detail & Related papers (2023-06-10T08:58:53Z)
DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction [50.51901599433536]
DisfluencyFixer is a tool that performs speech-to-speech disfluency correction in English and Hindi. Our proposed system removes disfluencies from input speech and returns fluent speech as output.
arXiv Detail & Related papers (2023-05-26T14:13:38Z)
Leveraging World Knowledge in Implicit Hate Speech Detection [5.5536024561229205]
We show that real world knowledge about entity mentions in a text does help models better detect hate speech. We also discuss cases where real world knowledge does not add value to hate speech detection.
arXiv Detail & Related papers (2022-12-28T21:23:55Z)
Hate Speech and Counter Speech Detection: Conversational Context Does Matter [7.333666276087548]
This paper investigates the role of conversational context in the annotation and detection of online hate and counter speech. We created a context-aware dataset for a 3-way classification task on Reddit comments: hate speech, counter speech, or neutral.
arXiv Detail & Related papers (2022-06-13T19:05:44Z)
Improving Self-Supervised Speech Representations by Disentangling Speakers [56.486084431528695]
Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus. Disentangling speakers is very challenging, because removing the speaker information could easily result in a loss of content as well. We propose a new SSL method that can achieve speaker disentanglement without severe loss of content.
arXiv Detail & Related papers (2022-04-20T04:56:14Z)
Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable Topics for the Russian Language [76.58220021791955]
We present two text collections labelled according to binary notion of inapropriateness and a multinomial notion of sensitive topic. To objectivise the notion of inappropriateness, we define it in a data-driven way though crowdsourcing.
arXiv Detail & Related papers (2022-03-04T15:59:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.