Argument Quality Assessment in the Age of Instruction-Following Large Language Models
- URL: http://arxiv.org/abs/2403.16084v1
- Date: Sun, 24 Mar 2024 10:43:21 GMT
- Title: Argument Quality Assessment in the Age of Instruction-Following Large Language Models
- Authors: Henning Wachsmuth, Gabriella Lapesa, Elena Cabrio, Anne Lauscher, Joonsuk Park, Eva Maria Vecchi, Serena Villata, Timon Ziegenbein,
- Abstract summary: A critical task in any such application is the assessment of an argument's quality.
We identify the diversity of quality notions and the subjectiveness of their perception as the main hurdles towards substantial progress on argument quality assessment.
We argue that the capabilities of instruction-following large language models (LLMs) to leverage knowledge across contexts enable a much more reliable assessment.
- Score: 45.832808321166844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The computational treatment of arguments on controversial issues has been subject to extensive NLP research, due to its envisioned impact on opinion formation, decision making, writing education, and the like. A critical task in any such application is the assessment of an argument's quality - but it is also particularly challenging. In this position paper, we start from a brief survey of argument quality research, where we identify the diversity of quality notions and the subjectiveness of their perception as the main hurdles towards substantial progress on argument quality assessment. We argue that the capabilities of instruction-following large language models (LLMs) to leverage knowledge across contexts enable a much more reliable assessment. Rather than just fine-tuning LLMs towards leaderboard chasing on assessment tasks, they need to be instructed systematically with argumentation theories and scenarios as well as with ways to solve argument-related problems. We discuss the real-world opportunities and ethical issues emerging thereby.
Related papers
- The Future of Learning in the Age of Generative AI: Automated Question Generation and Assessment with Large Language Models [0.0]
Large language models (LLMs) and generative AI have revolutionized natural language processing (NLP)
This chapter explores the transformative potential of LLMs in automated question generation and answer assessment.
arXiv Detail & Related papers (2024-10-12T15:54:53Z) - Persuasiveness of Generated Free-Text Rationales in Subjective Decisions: A Case Study on Pairwise Argument Ranking [4.1017420444369215]
We analyze generated free-text rationales in tasks with subjective answers.
We focus on pairwise argument ranking, a highly subjective task with significant potential for real-world applications.
Our findings suggest that open-source LLMs, particularly Llama2-70B-chat, are capable of providing highly persuasive rationalizations.
arXiv Detail & Related papers (2024-06-20T00:28:33Z) - ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models [53.00812898384698]
We argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking.
We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert.
We propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
arXiv Detail & Related papers (2024-05-28T22:45:28Z) - Are Large Language Models Reliable Argument Quality Annotators? [7.966402845339264]
We study the potential of using state-of-the-art large language models (LLMs) as proxies for argument quality annotators.
Our findings highlight that LLMs can produce consistent annotations, with a moderately high agreement with human experts.
arXiv Detail & Related papers (2024-04-15T11:54:27Z) - A Unifying Framework for Learning Argumentation Semantics [50.69905074548764]
We present a novel framework, which uses an Inductive Logic Programming approach to learn the acceptability semantics for several abstract and structured argumentation frameworks in an interpretable way.
Our framework outperforms existing argumentation solvers, thus opening up new future research directions in the area of formal argumentation and human-machine dialogues.
arXiv Detail & Related papers (2023-10-18T20:18:05Z) - How to Handle Different Types of Out-of-Distribution Scenarios in Computational Argumentation? A Comprehensive and Fine-Grained Field Study [59.13867562744973]
This work systematically assesses LMs' capabilities for out-of-distribution (OOD) scenarios.
We find that the efficacy of such learning paradigms varies with the type of OOD.
Specifically, while ICL excels for domain shifts, prompt-based fine-tuning surpasses for topic shifts.
arXiv Detail & Related papers (2023-09-15T11:15:47Z) - ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models.
Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - Towards a Holistic View on Argument Quality Prediction [3.182597245365433]
A decisive property of arguments is their strength or quality.
While there are works on the automated estimation of argument strength, their scope is narrow.
We assess the generalization capabilities of argument quality estimation across diverse domains, the interplay with related argument mining tasks, and the impact of emotions on perceived argument strength.
arXiv Detail & Related papers (2022-05-19T18:44:23Z) - Learning From Revisions: Quality Assessment of Claims in Argumentation
at Scale [12.883536911500062]
We study claim quality assessment irrespective of discussed aspects by comparing different revisions of the same claim.
We propose two tasks: assessing which claim of a revision pair is better, and ranking all versions of a claim by quality.
arXiv Detail & Related papers (2021-01-25T17:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.