Related papers: Intrinsic Quality Assessment of Arguments

Intrinsic Quality Assessment of Arguments

URL: http://arxiv.org/abs/2010.12473v1
Date: Fri, 23 Oct 2020 15:16:10 GMT
Title: Intrinsic Quality Assessment of Arguments
Authors: Henning Wachsmuth and Till Werner
Abstract summary: We study the intrinsic computational assessment of 15 dimensions, i.e., only learning from an argument's text. We observe moderate but significant learning success for most dimensions.
Score: 21.261009977405898
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Several quality dimensions of natural language arguments have been investigated. Some are likely to be reflected in linguistic features (e.g., an argument's arrangement), whereas others depend on context (e.g., relevance) or topic knowledge (e.g., acceptability). In this paper, we study the intrinsic computational assessment of 15 dimensions, i.e., only learning from an argument's text. In systematic experiments with eight feature types on an existing corpus, we observe moderate but significant learning success for most dimensions. Rhetorical quality seems hardest to assess, and subjectivity features turn out strong, although length bias in the corpus impedes full validity. We also find that human assessors differ more clearly to each other than to our approach.

Related papers

An Empirical Study of Evaluating Long-form Question Answering [77.8023489322551]
We collect 5,236 factoid and non-factoid long-form answers generated by different large language models. We conduct a human evaluation on 2,079 of them, focusing on correctness and informativeness. We find that the style, length of the answers, and the category of questions can bias the automatic evaluation metrics.
arXiv Detail & Related papers (2025-04-25T15:14:25Z)
A scale of conceptual orality and literacy: Automatic text categorization in the tradition of "Nähe und Distanz" [0.0]
It is stipulated that written texts can be rated on a scale of conceptual orality and literacy by linguistic features. This article establishes such a scale based on PCA and combines it with automatic analysis. The scale is also discussed with a view to its use in corpus compilation and as a guide for analyzes in larger corpora.
arXiv Detail & Related papers (2025-02-05T15:08:37Z)
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models [53.00812898384698]
We argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking. We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert. We propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
arXiv Detail & Related papers (2024-05-28T22:45:28Z)
Argument Quality Assessment in the Age of Instruction-Following Large Language Models [45.832808321166844]
A critical task in any such application is the assessment of an argument's quality. We identify the diversity of quality notions and the subjectiveness of their perception as the main hurdles towards substantial progress on argument quality assessment. We argue that the capabilities of instruction-following large language models (LLMs) to leverage knowledge across contexts enable a much more reliable assessment.
arXiv Detail & Related papers (2024-03-24T10:43:21Z)
Which Argumentative Aspects of Hate Speech in Social Media can be reliably identified? [2.7647400328727256]
It is unclear which aspects of argumentation can be reliably identified and integrated in language models. We show that some components can be identified with reasonable reliability. We propose adaptations of those categories that can be more reliably reproduced.
arXiv Detail & Related papers (2023-06-05T15:50:57Z)
Modeling Appropriate Language in Argumentation [34.90028129715041]
We operationalize appropriate language in argumentation for the first time. We derive a new taxonomy of 14 dimensions that determine inappropriate language in online discussions.
arXiv Detail & Related papers (2023-05-24T09:17:05Z)
Natural Language Decompositions of Implicit Content Enable Better Text Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed. Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z)
How Do In-Context Examples Affect Compositional Generalization? [86.57079616209474]
In this paper, we present CoFe, a test suite to investigate in-context compositional generalization. We find that the compositional generalization performance can be easily affected by the selection of in-context examples. Our systematic experiments indicate that in-context examples should be structurally similar to the test case, diverse from each other, and individually simple.
arXiv Detail & Related papers (2023-05-08T16:32:18Z)
Towards a Holistic View on Argument Quality Prediction [3.182597245365433]
A decisive property of arguments is their strength or quality. While there are works on the automated estimation of argument strength, their scope is narrow. We assess the generalization capabilities of argument quality estimation across diverse domains, the interplay with related argument mining tasks, and the impact of emotions on perceived argument strength.
arXiv Detail & Related papers (2022-05-19T18:44:23Z)
Learning From Revisions: Quality Assessment of Claims in Argumentation at Scale [12.883536911500062]
We study claim quality assessment irrespective of discussed aspects by comparing different revisions of the same claim. We propose two tasks: assessing which claim of a revision pair is better, and ranking all versions of a claim by quality.
arXiv Detail & Related papers (2021-01-25T17:32:04Z)
A computational model implementing subjectivity with the 'Room Theory'. The case of detecting Emotion from Text [68.8204255655161]
This work introduces a new method to consider subjectivity and general context dependency in text analysis. By using similarity measure between words, we are able to extract the relative relevance of the elements in the benchmark. This method could be applied to all the cases where evaluating subjectivity is relevant to understand the relative value or meaning of a text.
arXiv Detail & Related papers (2020-05-12T21:26:04Z)
SubjQA: A Dataset for Subjectivity and Review Comprehension [52.13338191442912]
We investigate the relationship between subjectivity and question answering (QA) We find that subjectivity is also an important feature in the case of QA, albeit with more intricate interactions between subjectivity and QA performance. We release an English QA dataset (SubjQA) based on customer reviews, containing subjectivity annotations for questions and answer spans across 6 distinct domains.
arXiv Detail & Related papers (2020-04-29T15:59:30Z)
A Deep Neural Framework for Contextual Affect Detection [51.378225388679425]
A short and simple text carrying no emotion can represent some strong emotions when reading along with its context. We propose a Contextual Affect Detection framework which learns the inter-dependence of words in a sentence.
arXiv Detail & Related papers (2020-01-28T05:03:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.