Assessing the Human Likeness of AI-Generated Counterspeech
- URL: http://arxiv.org/abs/2410.11007v2
- Date: Sun, 15 Dec 2024 21:07:52 GMT
- Title: Assessing the Human Likeness of AI-Generated Counterspeech
- Authors: Xiaoying Song, Sujana Mamidisetty, Eduardo Blanco, Lingzi Hong,
- Abstract summary: This paper investigates the human likeness of AI-generated counterspeech.
We implement and evaluate several LLM-based generation strategies.
We reveal differences in linguistic characteristics, politeness, and specificity.
- Score: 10.434435022492723
- License:
- Abstract: Counterspeech is a targeted response to counteract and challenge abusive or hateful content. It effectively curbs the spread of hatred and fosters constructive online communication. Previous studies have proposed different strategies for automatically generated counterspeech. Evaluations, however, focus on relevance, surface form, and other shallow linguistic characteristics. This paper investigates the human likeness of AI-generated counterspeech, a critical factor influencing effectiveness. We implement and evaluate several LLM-based generation strategies, and discover that AI-generated and human-written counterspeech can be easily distinguished by both simple classifiers and humans. Further, we reveal differences in linguistic characteristics, politeness, and specificity. The dataset used in this study is publicly available for further research.
Related papers
- Who Writes What: Unveiling the Impact of Author Roles on AI-generated Text Detection [44.05134959039957]
We investigate how sociolinguistic attributes-gender, CEFR proficiency, academic field, and language environment-impact state-of-the-art AI text detectors.
Our results reveal significant biases: CEFR proficiency and language environment consistently affected detector accuracy, while gender and academic field showed detector-dependent effects.
These findings highlight the crucial need for socially aware AI text detection to avoid unfairly penalizing specific demographic groups.
arXiv Detail & Related papers (2025-02-18T07:49:31Z) - Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation [2.1944577276732726]
We propose and evaluate strategies for generating tailored counterspeech that is adapted to the moderation context and personalized for the moderated user.
Results show that contextualized counterspeech can significantly outperform state-of-the-art generic counterspeech in adequacy and persuasiveness.
The effectiveness of contextualized AI-generated counterspeech and the divergence between human and algorithmic evaluations underscore the importance of increased human-AI collaboration in content moderation.
arXiv Detail & Related papers (2024-12-10T09:29:52Z) - Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool [0.0]
This study aims to investigate how various linguistic components are represented in both types of texts, assessing the ability of AI to emulate human writing.
Despite AI-generated texts appearing to mimic human speech, the results revealed significant differences across multiple linguistic features.
arXiv Detail & Related papers (2024-07-04T05:37:09Z) - Outcome-Constrained Large Language Models for Countering Hate Speech [10.434435022492723]
This study aims to develop methods for generating counterspeech constrained by conversation outcomes.
We experiment with large language models (LLMs) to incorporate into the text generation process two desired conversation outcomes.
Evaluation results show that our methods effectively steer the generation of counterspeech toward the desired outcomes.
arXiv Detail & Related papers (2024-03-25T19:44:06Z) - Consolidating Strategies for Countering Hate Speech Using Persuasive
Dialogues [3.8979646385036175]
We explore controllable strategies for generating counter-arguments to hateful comments in online conversations.
Using automatic and human evaluations, we determine the best combination of features that generate fluent, argumentative, and logically sound arguments.
We share developed computational models for automatically annotating text with such features, and a silver-standard annotated version of an existing hate speech dialog corpora.
arXiv Detail & Related papers (2024-01-15T16:31:18Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Characterizing the adversarial vulnerability of speech self-supervised
learning [95.03389072594243]
We make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries.
The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries.
arXiv Detail & Related papers (2021-11-08T08:44:04Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism.
Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes.
Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.