Assessing Good, Bad and Ugly Arguments Generated by ChatGPT: a New Dataset, its Methodology and Associated Tasks
- URL: http://arxiv.org/abs/2406.15130v1
- Date: Fri, 21 Jun 2024 13:27:10 GMT
- Title: Assessing Good, Bad and Ugly Arguments Generated by ChatGPT: a New Dataset, its Methodology and Associated Tasks
- Authors: Victor Hugo Nascimento Rocha, Igor Cataneo Silveira, Paulo Pirozelli, Denis Deratani Mauá, Fabio Gagliardi Cozman,
- Abstract summary: This paper introduces a methodology to obtain good, bad and ugly arguments from argumentative essays produced by ChatGPT.
We then describe a novel dataset containing a set of diverse arguments, ArGPT.
We show that the artificially generated data relates well to human argumentation and thus is useful as a tool to train and test systems for the defined tasks.
- Score: 1.6427658855248812
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent success of Large Language Models (LLMs) has sparked concerns about their potential to spread misinformation. As a result, there is a pressing need for tools to identify ``fake arguments'' generated by such models. To create these tools, examples of texts generated by LLMs are needed. This paper introduces a methodology to obtain good, bad and ugly arguments from argumentative essays produced by ChatGPT, OpenAI's LLM. We then describe a novel dataset containing a set of diverse arguments, ArGPT. We assess the effectiveness of our dataset and establish baselines for several argumentation-related tasks. Finally, we show that the artificially generated data relates well to human argumentation and thus is useful as a tool to train and test systems for the defined tasks.
Related papers
- Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content.
Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning.
Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z) - Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation [13.205613282888676]
We introduce an argument mining dataset that captures the end-to-end process of preparing an argumentative essay for a debate.
Our dataset contains 14k examples of claims that are fully annotated with the various properties supporting the aforementioned tasks.
We find, that while they show promising results on individual tasks in our benchmark, their end-to-end performance on all four tasks in succession deteriorates significantly.
arXiv Detail & Related papers (2024-06-05T11:15:45Z) - Interactive Analysis of LLMs using Meaningful Counterfactuals [22.755345889167934]
Counterfactual examples are useful for exploring the decision boundaries of machine learning models.
How can we apply counterfactual-based methods to analyze and explain LLMs?
We propose a novel algorithm for generating batches of complete and meaningful textual counterfactuals.
In our experiments, 97.2% of the counterfactuals are grammatically correct.
arXiv Detail & Related papers (2024-04-23T19:57:03Z) - Exploring the Potential of Large Language Models in Computational Argumentation [54.85665903448207]
Large language models (LLMs) have demonstrated impressive capabilities in understanding context and generating natural language.
This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models, and LLaMA2 models, in both zero-shot and few-shot settings.
arXiv Detail & Related papers (2023-11-15T15:12:15Z) - FacTool: Factuality Detection in Generative AI -- A Tool Augmented
Framework for Multi-Task and Multi-Domain Scenarios [87.12753459582116]
A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models.
We propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models.
arXiv Detail & Related papers (2023-07-25T14:20:51Z) - QTSumm: Query-Focused Summarization over Tabular Data [58.62152746690958]
People primarily consult tables to conduct data analysis or answer specific questions.
We define a new query-focused table summarization task, where text generation models have to perform human-like reasoning.
We introduce a new benchmark named QTSumm for this task, which contains 7,111 human-annotated query-summary pairs over 2,934 tables.
arXiv Detail & Related papers (2023-05-23T17:43:51Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - Aspect-Controlled Neural Argument Generation [65.91772010586605]
We train a language model for argument generation that can be controlled on a fine-grained level to generate sentence-level arguments for a given topic, stance, and aspect.
Our evaluation shows that our generation model is able to generate high-quality, aspect-specific arguments.
These arguments can be used to improve the performance of stance detection models via data augmentation and to generate counter-arguments.
arXiv Detail & Related papers (2020-04-30T20:17:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.