Towards a Perspectivist Turn in Argument Quality Assessment
- URL: http://arxiv.org/abs/2502.14501v1
- Date: Thu, 20 Feb 2025 12:30:26 GMT
- Title: Towards a Perspectivist Turn in Argument Quality Assessment
- Authors: Julia Romberg, Maximilian Maurer, Henning Wachsmuth, Gabriella Lapesa,
- Abstract summary: We conduct a systematic review of argument quality datasets.
We consolidate quality dimensions and who annotated them.
We discuss datasets suitable for developing perspectivist models.
- Score: 21.915319388303914
- License:
- Abstract: The assessment of argument quality depends on well-established logical, rhetorical, and dialectical properties that are unavoidably subjective: multiple valid assessments may exist, there is no unequivocal ground truth. This aligns with recent paths in machine learning, which embrace the co-existence of different perspectives. However, this potential remains largely unexplored in NLP research on argument quality. One crucial reason seems to be the yet unexplored availability of suitable datasets. We fill this gap by conducting a systematic review of argument quality datasets. We assign them to a multi-layered categorization targeting two aspects: (a) What has been annotated: we collect the quality dimensions covered in datasets and consolidate them in an overarching taxonomy, increasing dataset comparability and interoperability. (b) Who annotated: we survey what information is given about annotators, enabling perspectivist research and grounding our recommendations for future actions. To this end, we discuss datasets suitable for developing perspectivist models (i.e., those containing individual, non-aggregated annotations), and we showcase the importance of a controlled selection of annotators in a pilot study.
Related papers
- Feature Importance Depends on Properties of the Data: Towards Choosing the Correct Explanations for Your Data and Decision Trees based Models [3.8246193345000226]
We assess the quality of feature importance estimates provided by local explanation methods.
We find notable disparities in the magnitude and sign of the feature importance estimates generated by these methods.
Our assessment highlights these limitations and provides valuable insight into the suitability and reliability of different explanatory methods.
arXiv Detail & Related papers (2025-02-11T00:29:55Z) - Class-constrained t-SNE: Combining Data Features and Class Probabilities [1.3285222309805058]
We propose a class-constrained t-SNE that combines data features and class probabilities in the same DR result.
We illustrate its application potential in model evaluation and visual-interactive labeling.
arXiv Detail & Related papers (2023-08-26T10:05:07Z) - Towards Reliable Assessments of Demographic Disparities in Multi-Label
Image Classifiers [11.973749734226852]
We consider multi-label image classification and, specifically, object categorization tasks.
Design choices and trade-offs for measurement involve more nuance than discussed in prior computer vision literature.
We identify several design choices that look merely like implementation details but significantly impact the conclusions of assessments.
arXiv Detail & Related papers (2023-02-16T20:34:54Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z) - Agreeing to Disagree: Annotating Offensive Language Datasets with
Annotators' Disagreement [7.288480094345606]
We focus on the level of agreement among annotators while selecting data to create offensive language datasets.
Our study comprises the creation of three novel datasets of English tweets covering different topics.
We show that such hard cases, where low agreement is present, are not necessarily due to poor-quality annotation.
arXiv Detail & Related papers (2021-09-28T08:55:04Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Learning From Revisions: Quality Assessment of Claims in Argumentation
at Scale [12.883536911500062]
We study claim quality assessment irrespective of discussed aspects by comparing different revisions of the same claim.
We propose two tasks: assessing which claim of a revision pair is better, and ranking all versions of a claim by quality.
arXiv Detail & Related papers (2021-01-25T17:32:04Z) - Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification [75.49600684537117]
Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness.
We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability.
Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
arXiv Detail & Related papers (2021-01-18T22:55:40Z) - Towards Understanding Sample Variance in Visually Grounded Language
Generation: Evaluations and Observations [67.4375210552593]
We design experiments to understand an important but often ignored problem in visually grounded language generation.
Given that humans have different utilities and visual attention, how will the sample variance in multi-reference datasets affect the models' performance?
We show that it is of paramount importance to report variance in experiments; that human-generated references could vary drastically in different datasets/tasks, revealing the nature of each task.
arXiv Detail & Related papers (2020-10-07T20:45:14Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.