Related papers: On Zero-Shot Counterspeech Generation by LLMs

On Zero-Shot Counterspeech Generation by LLMs

URL: http://arxiv.org/abs/2403.14938v1
Date: Fri, 22 Mar 2024 04:13:10 GMT
Title: On Zero-Shot Counterspeech Generation by LLMs
Authors: Punyajoy Saha, Aalok Agrawal, Abhik Jana, Chris Biemann, Animesh Mukherjee,
Abstract summary: We present a comprehensive analysis of the performances of four Large Language Models (LLM) in zero-shot settings for counterspeech generation. Considering type of model, GPT-2 and FlanT5 models are significantly better in terms of counterspeech quality. ChatGPT are much better at generating counter speech than other models across all metrics.
Score: 23.39818166945086
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the emergence of numerous Large Language Models (LLM), the usage of such models in various Natural Language Processing (NLP) applications is increasing extensively. Counterspeech generation is one such key task where efforts are made to develop generative models by fine-tuning LLMs with hatespeech - counterspeech pairs, but none of these attempts explores the intrinsic properties of large language models in zero-shot settings. In this work, we present a comprehensive analysis of the performances of four LLMs namely GPT-2, DialoGPT, ChatGPT and FlanT5 in zero-shot settings for counterspeech generation, which is the first of its kind. For GPT-2 and DialoGPT, we further investigate the deviation in performance with respect to the sizes (small, medium, large) of the models. On the other hand, we propose three different prompting strategies for generating different types of counterspeech and analyse the impact of such strategies on the performance of the models. Our analysis shows that there is an improvement in generation quality for two datasets (17%), however the toxicity increase (25%) with increase in model size. Considering type of model, GPT-2 and FlanT5 models are significantly better in terms of counterspeech quality but also have high toxicity as compared to DialoGPT. ChatGPT are much better at generating counter speech than other models across all metrics. In terms of prompting, we find that our proposed strategies help in improving counter speech generation across all the models.

Related papers

Evaluating Prompt-Based and Fine-Tuned Approaches to Czech Anaphora Resolution [0.0]
Anaphora resolution plays a critical role in natural language understanding in morphologically rich languages like Czech.<n>This paper presents a comparative evaluation of two modern approaches to anaphora resolution on Czech text.<n>We compare prompt engineering with large language models (LLMs) and fine-tuning compact generative models.
arXiv Detail & Related papers (2025-06-22T16:32:57Z)
CODEOFCONDUCT at Multilingual Counterspeech Generation: A Context-Aware Model for Robust Counterspeech Generation in Low-Resource Languages [1.9263811967110864]
This paper introduces a context-aware model for robust counterspeech generation, which achieved significant success in the MCG-COLING-2025 shared task. By leveraging a simulated annealing algorithm fine-tuned on multilingual datasets, the model generates factually accurate responses to hate speech. We demonstrate state-of-the-art performance across four languages, with our system ranking first for Basque, second for Italian, and third for both English and Spanish.
arXiv Detail & Related papers (2025-01-01T03:36:31Z)
Generation, Distillation and Evaluation of Motivational Interviewing-Style Reflections with a Foundational Language Model [2.33956825429387]
We present a method for distilling the generation of reflections from a Foundational Language Model into smaller models. We first show that GPT-4, using zero-shot prompting, can generate reflections at near 100% success rate. We also show that GPT-4 can help in the labor-intensive task of evaluating the quality of the distilled models.
arXiv Detail & Related papers (2024-02-01T22:54:31Z)
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation [56.913182262166316]
Chain-of-Information Generation (CoIG) is a method for decoupling semantic and perceptual information in large-scale speech generation. SpeechGPT-Gen is efficient in semantic and perceptual information modeling. It markedly excels in zero-shot text-to-speech, zero-shot voice conversion, and speech-to-speech dialogue.
arXiv Detail & Related papers (2024-01-24T15:25:01Z)
On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models. We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z)
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning [7.543506531838883]
This paper proposes LM- CPPF, Contrastive Paraphrasing-guided Prompt-based Fine-tuning of Language Models. Our experiments on multiple text classification benchmarks show that this augmentation method outperforms other methods.
arXiv Detail & Related papers (2023-05-29T15:59:51Z)
Extrapolating Multilingual Understanding Models as Multilingual Generators [82.1355802012414]
This paper explores methods to empower multilingual understanding models the generation abilities to get a unified model. We propose a textbfSemantic-textbfGuided textbfAlignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters.
arXiv Detail & Related papers (2023-05-22T15:33:21Z)
SimOAP: Improve Coherence and Consistency in Persona-based Dialogue Generation via Over-sampling and Post-evaluation [54.66399120084227]
Language models trained on large-scale corpora can generate remarkably fluent results in open-domain dialogue. For the persona-based dialogue generation task, consistency and coherence are great challenges for language models. A two-stage SimOAP strategy is proposed, i.e., over-sampling and post-evaluation.
arXiv Detail & Related papers (2023-05-18T17:23:00Z)
Elaboration-Generating Commonsense Question Answering at Scale [77.96137534751445]
In question answering requiring common sense, language models (e.g., GPT-3) have been used to generate text expressing background knowledge. We finetune smaller language models to generate useful intermediate context, referred to here as elaborations. Our framework alternates between updating two language models -- an elaboration generator and an answer predictor -- allowing each to influence the other.
arXiv Detail & Related papers (2022-09-02T18:32:09Z)
mGPT: Few-Shot Learners Go Multilingual [1.4354798873010843]
This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages. We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism. The resulting models show performance on par with the recently released XGLM models by Facebook.
arXiv Detail & Related papers (2022-04-15T13:02:33Z)
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals [151.3601429216877]
We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model. We propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO) The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks.
arXiv Detail & Related papers (2022-04-13T21:39:15Z)
PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM. We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.