Argue with Me Tersely: Towards Sentence-Level Counter-Argument
Generation
- URL: http://arxiv.org/abs/2312.13608v1
- Date: Thu, 21 Dec 2023 06:51:34 GMT
- Title: Argue with Me Tersely: Towards Sentence-Level Counter-Argument
Generation
- Authors: Jiayu Lin, Rong Ye, Meng Han, Qi Zhang, Ruofei Lai, Xinyu Zhang, Zhao
Cao, Xuanjing Huang, Zhongyu Wei
- Abstract summary: We present the ArgTersely benchmark for sentence-level counter-argument generation.
We also propose Arg-LlaMA for generating high-quality counter-argument.
- Score: 62.069374456021016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Counter-argument generation -- a captivating area in computational
linguistics -- seeks to craft statements that offer opposing views. While most
research has ventured into paragraph-level generation, sentence-level
counter-argument generation beckons with its unique constraints and
brevity-focused challenges. Furthermore, the diverse nature of
counter-arguments poses challenges for evaluating model performance solely
based on n-gram-based metrics. In this paper, we present the ArgTersely
benchmark for sentence-level counter-argument generation, drawing from a
manually annotated dataset from the ChangeMyView debate forum. We also propose
Arg-LlaMA for generating high-quality counter-argument. For better evaluation,
we trained a BERT-based evaluator Arg-Judge with human preference data. We
conducted comparative experiments involving various baselines such as LlaMA,
Alpaca, GPT-3, and others. The results show the competitiveness of our proposed
framework and evaluator in counter-argument generation tasks. Code and data are
available at https://github.com/amazingljy1206/ArgTersely.
Related papers
- Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM [51.43102092480804]
Debatrix is an automated debate judge based on Large Language Models (LLMs)
To align with real-world debate scenarios, we introduced the PanelBench benchmark, comparing our system's performance to actual debate outcomes.
The findings indicate a notable enhancement over directly using LLMs for debate evaluation.
arXiv Detail & Related papers (2024-03-12T18:19:47Z) - Auditing Counterfire: Evaluating Advanced Counterargument Generation with Evidence and Style [11.243184875465788]
GPT-3.5 Turbo ranked highest in argument quality with strong paraphrasing and style adherence, particularly in reciprocity' style arguments.
The stylistic counter-arguments still fall short of human persuasive standards, where people also preferred reciprocal to evidence-based rebuttals.
arXiv Detail & Related papers (2024-02-13T14:53:12Z) - Exploring the Potential of Large Language Models in Computational Argumentation [54.85665903448207]
Large language models (LLMs) have demonstrated impressive capabilities in understanding context and generating natural language.
This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models, and LLaMA2 models, in both zero-shot and few-shot settings.
arXiv Detail & Related papers (2023-11-15T15:12:15Z) - Sentiment Analysis through LLM Negotiations [58.67939611291001]
A standard paradigm for sentiment analysis is to rely on a singular LLM and makes the decision in a single round.
This paper introduces a multi-LLM negotiation framework for sentiment analysis.
arXiv Detail & Related papers (2023-11-03T12:35:29Z) - ArgU: A Controllable Factual Argument Generator [0.0]
ArgU is a neural argument generator capable of producing factual arguments from input facts and real-world concepts.
We have compiled and released an annotated corpora of 69,428 arguments spanning six topics and six argument schemes.
arXiv Detail & Related papers (2023-05-09T10:49:45Z) - QRelScore: Better Evaluating Generated Questions with Deeper
Understanding of Context-aware Relevance [54.48031346496593]
We propose $textbfQRelScore$, a context-aware evaluation metric for $underlinetextbfRel$evance evaluation metric.
Based on off-the-shelf language models such as BERT and GPT2, QRelScore employs both word-level hierarchical matching and sentence-level prompt-based generation.
Compared with existing metrics, our experiments demonstrate that QRelScore is able to achieve a higher correlation with human judgments while being much more robust to adversarial samples.
arXiv Detail & Related papers (2022-04-29T07:39:53Z) - Aspect-Controlled Neural Argument Generation [65.91772010586605]
We train a language model for argument generation that can be controlled on a fine-grained level to generate sentence-level arguments for a given topic, stance, and aspect.
Our evaluation shows that our generation model is able to generate high-quality, aspect-specific arguments.
These arguments can be used to improve the performance of stance detection models via data augmentation and to generate counter-arguments.
arXiv Detail & Related papers (2020-04-30T20:17:22Z) - AMPERSAND: Argument Mining for PERSuAsive oNline Discussions [41.06165177604387]
We propose a computational model for argument mining in online persuasive discussion forums.
Our approach relies on identifying relations between components of arguments in a discussion thread.
Our models obtain significant improvements compared to recent state-of-the-art approaches.
arXiv Detail & Related papers (2020-04-30T10:33:40Z) - Same Side Stance Classification Task: Facilitating Argument Stance
Classification by Fine-tuning a BERT Model [8.8896707993459]
The same side stance classification task provides a dataset of argument pairs classified by whether or not both arguments share the same stance.
We fine-tuned a pre-trained BERT model for three epochs and used the first 512 tokens of each argument to predict if two arguments share the same stance.
arXiv Detail & Related papers (2020-04-23T13:54:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.