Auditing Counterfire: Evaluating Advanced Counterargument Generation with Evidence and Style
- URL: http://arxiv.org/abs/2402.08498v4
- Date: Sat, 20 Apr 2024 03:47:18 GMT
- Title: Auditing Counterfire: Evaluating Advanced Counterargument Generation with Evidence and Style
- Authors: Preetika Verma, Kokil Jaidka, Svetlana Churina,
- Abstract summary: GPT-3.5 Turbo ranked highest in argument quality with strong paraphrasing and style adherence, particularly in reciprocity' style arguments.
The stylistic counter-arguments still fall short of human persuasive standards, where people also preferred reciprocal to evidence-based rebuttals.
- Score: 11.243184875465788
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We audited large language models (LLMs) for their ability to create evidence-based and stylistic counter-arguments to posts from the Reddit ChangeMyView dataset. We benchmarked their rhetorical quality across a host of qualitative and quantitative metrics and then ultimately evaluated them on their persuasive abilities as compared to human counter-arguments. Our evaluation is based on Counterfire: a new dataset of 32,000 counter-arguments generated from large language models (LLMs): GPT-3.5 Turbo and Koala and their fine-tuned variants, and PaLM 2, with varying prompts for evidence use and argumentative style. GPT-3.5 Turbo ranked highest in argument quality with strong paraphrasing and style adherence, particularly in `reciprocity' style arguments. However, the stylistic counter-arguments still fall short of human persuasive standards, where people also preferred reciprocal to evidence-based rebuttals. The findings suggest that a balance between evidentiality and stylistic elements is vital to a compelling counter-argument. We close with a discussion of future research directions and implications for evaluating LLM outputs.
Related papers
- What Evidence Do Language Models Find Convincing? [94.90663008214918]
We build a dataset that pairs controversial queries with a series of real-world evidence documents that contain different facts.
We use this dataset to perform sensitivity and counterfactual analyses to explore which text features most affect LLM predictions.
Overall, we find that current models rely heavily on the relevance of a website to the query, while largely ignoring stylistic features that humans find important.
arXiv Detail & Related papers (2024-02-19T02:15:34Z) - Argue with Me Tersely: Towards Sentence-Level Counter-Argument
Generation [62.069374456021016]
We present the ArgTersely benchmark for sentence-level counter-argument generation.
We also propose Arg-LlaMA for generating high-quality counter-argument.
arXiv Detail & Related papers (2023-12-21T06:51:34Z) - Exploring Jiu-Jitsu Argumentation for Writing Peer Review Rebuttals [70.22179850619519]
In many domains of argumentation, people's arguments are driven by so-called attitude roots.
Recent work in psychology suggests that instead of directly countering surface-level reasoning, one should follow an argumentation style inspired by the Jiu-Jitsu'soft' combat system.
We are the first to explore Jiu-Jitsu argumentation for peer review by proposing the novel task of attitude and theme-guided rebuttal generation.
arXiv Detail & Related papers (2023-11-07T13:54:01Z) - Contextualizing Argument Quality Assessment with Relevant Knowledge [11.367297319588411]
SPARK is a novel method for scoring argument quality based on contextualization via relevant knowledge.
We devise four augmentations that leverage large language models to provide feedback, infer hidden assumptions, supply a similar-quality argument, or give a counter-argument.
arXiv Detail & Related papers (2023-05-20T21:04:58Z) - Persua: A Visual Interactive System to Enhance the Persuasiveness of
Arguments in Online Discussion [52.49981085431061]
Enhancing people's ability to write persuasive arguments could contribute to the effectiveness and civility in online communication.
We derived four design goals for a tool that helps users improve the persuasiveness of arguments in online discussions.
Persua is an interactive visual system that provides example-based guidance on persuasive strategies to enhance the persuasiveness of arguments.
arXiv Detail & Related papers (2022-04-16T08:07:53Z) - Argument Undermining: Counter-Argument Generation by Attacking Weak
Premises [31.463885580010192]
We explore argument undermining, that is, countering an argument by attacking one of its premises.
We propose a pipeline approach that first assesses the premises' strength and then generates a counter-argument targeting the weak ones.
arXiv Detail & Related papers (2021-05-25T08:39:14Z) - Aspect-Controlled Neural Argument Generation [65.91772010586605]
We train a language model for argument generation that can be controlled on a fine-grained level to generate sentence-level arguments for a given topic, stance, and aspect.
Our evaluation shows that our generation model is able to generate high-quality, aspect-specific arguments.
These arguments can be used to improve the performance of stance detection models via data augmentation and to generate counter-arguments.
arXiv Detail & Related papers (2020-04-30T20:17:22Z) - AMPERSAND: Argument Mining for PERSuAsive oNline Discussions [41.06165177604387]
We propose a computational model for argument mining in online persuasive discussion forums.
Our approach relies on identifying relations between components of arguments in a discussion thread.
Our models obtain significant improvements compared to recent state-of-the-art approaches.
arXiv Detail & Related papers (2020-04-30T10:33:40Z) - What Changed Your Mind: The Roles of Dynamic Topics and Discourse in
Argumentation Process [78.4766663287415]
This paper presents a study that automatically analyzes the key factors in argument persuasiveness.
We propose a novel neural model that is able to track the changes of latent topics and discourse in argumentative conversations.
arXiv Detail & Related papers (2020-02-10T04:27:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.