Related papers: AQuA -- Combining Experts' and Non-Experts' Views To Assess Deliberation Quality in Online Discussions Using LLMs

AQuA -- Combining Experts' and Non-Experts' Views To Assess Deliberation Quality in Online Discussions Using LLMs

URL: http://arxiv.org/abs/2404.02761v3
Date: Wed, 17 Apr 2024 10:56:48 GMT
Title: AQuA -- Combining Experts' and Non-Experts' Views To Assess Deliberation Quality in Online Discussions Using LLMs
Authors: Maike Behrendt, Stefan Sylvius Wagner, Marc Ziegele, Lena Wilms, Anke Stoll, Dominique Heinbach, Stefan Harmeling,
Abstract summary: AQuA is an additive score that calculates a unified deliberative quality score from multiple indices for each discussion post. We develop adapter models for 20 deliberative indices, and calculate correlation coefficients between experts' annotations and the perceived deliberativeness by non-experts to weigh the individual indices into a single deliberative score.
Score: 0.9737366359397255
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Measuring the quality of contributions in political online discussions is crucial in deliberation research and computer science. Research has identified various indicators to assess online discussion quality, and with deep learning advancements, automating these measures has become feasible. While some studies focus on analyzing specific quality indicators, a comprehensive quality score incorporating various deliberative aspects is often preferred. In this work, we introduce AQuA, an additive score that calculates a unified deliberative quality score from multiple indices for each discussion post. Unlike other singular scores, AQuA preserves information on the deliberative aspects present in comments, enhancing model transparency. We develop adapter models for 20 deliberative indices, and calculate correlation coefficients between experts' annotations and the perceived deliberativeness by non-experts to weigh the individual indices into a single deliberative score. We demonstrate that the AQuA score can be computed easily from pre-trained adapters and aligns well with annotations on other datasets that have not be seen during training. The analysis of experts' vs. non-experts' annotations confirms theoretical findings in the social science literature.

Related papers

A Comprehensive Survey of Action Quality Assessment: Method and Benchmark [25.694556140797832]
Action Quality Assessment (AQA) quantitatively evaluates the quality of human actions, providing automated assessments that reduce biases in human judgment. Recent advances in AQA have introduced innovative methodologies, but similar methods often intertwine across different domains. The lack of a unified benchmark and limited computational comparisons hinder consistent evaluation and fair assessment of AQA approaches.
arXiv Detail & Related papers (2024-12-15T10:47:26Z)
Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z)
An Automatic Question Usability Evaluation Toolkit [1.2499537119440245]
evaluating multiple-choice questions (MCQs) involves either labor intensive human assessments or automated methods that prioritize readability. We introduce SAQUET, an open-source tool that leverages the Item-Writing Flaws (IWF) rubric for a comprehensive and automated quality evaluation of MCQs. With an accuracy rate of over 94%, our findings emphasize the limitations of existing evaluation methods and showcase potential in improving the quality of educational assessments.
arXiv Detail & Related papers (2024-05-30T23:04:53Z)
Accurate and Nuanced Open-QA Evaluation Through Textual Entailment [4.762213968673381]
We propose to study the entailment relations of answers to identify more informative and more general system answers. The entailment-based evaluation we propose allows the assignment of bonus or partial marks by quantifying the inference gap between answers.
arXiv Detail & Related papers (2024-05-26T21:33:27Z)
HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation [20.178644251662316]
We introduce the hierarchical graph of thoughts (HGOT) to enhance the retrieval of pertinent passages during in-context learning. The framework employs the divide-and-conquer strategy to break down complex queries into manageable sub-queries. It refines self-consistency majority voting for answer selection, which incorporates the recently proposed citation recall and precision metrics.
arXiv Detail & Related papers (2024-02-14T18:41:19Z)
Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs [35.089203283068635]
We introduce a set of fine-grained categories for measuring the attribution, and develop a Complex Attributed Question Answering (CAQA) benchmark. Our analysis reveals that existing evaluators perform poorly under fine-grained attribution settings and exhibit weaknesses in complex citation-statement reasoning.
arXiv Detail & Related papers (2024-01-26T04:11:07Z)
Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video Quality Assessment [54.31355080688127]
We introduce a text-prompted Semantic Affinity Quality Index (SAQI) and its localized version (SAQI-Local) using Contrastive Language-Image Pre-training (CLIP) BVQI-Local demonstrates unprecedented performance, surpassing existing zero-shot indices by at least 24% on all datasets. We conduct comprehensive analyses to investigate different quality concerns of distinct indices, demonstrating the effectiveness and rationality of our design.
arXiv Detail & Related papers (2023-04-28T08:06:05Z)
Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments. It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z)
Investigating Fairness Disparities in Peer Review: A Language Model Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs) We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date. We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)
Uncertainty-Driven Action Quality Assessment [67.20617610820857]
We propose a novel probabilistic model, named Uncertainty-Driven AQA (UD-AQA), to capture the diversity among multiple judge scores. We generate the estimation of uncertainty for each prediction, which is employed to re-weight AQA regression loss. Our proposed method achieves competitive results on three benchmarks including the Olympic events MTL-AQA and FineDiving, and the surgical skill JIGSAWS datasets.
arXiv Detail & Related papers (2022-07-29T07:21:15Z)
Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis. We learn sentiment, aspect> joint topic embeddings in the word embedding space. We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
An Interpretable and Uncertainty Aware Multi-Task Framework for Multi-Aspect Sentiment Analysis [15.755185152760083]
Document-level Multi-aspect Sentiment Classification (DMSC) is a challenging and imminent problem. We propose a deliberate self-attention-based deep neural network model, namely FEDAR, for the DMSC problem. FEDAR can achieve competitive performance while also being able to interpret the predictions made.
arXiv Detail & Related papers (2020-09-18T22:32:39Z)
Uncertainty-aware Score Distribution Learning for Action Quality Assessment [91.05846506274881]
We propose an uncertainty-aware score distribution learning (USDL) approach for action quality assessment (AQA) Specifically, we regard an action as an instance associated with a score distribution, which describes the probability of different evaluated scores. Under the circumstance where fine-grained score labels are available, we devise a multi-path uncertainty-aware score distributions learning (MUSDL) method to explore the disentangled components of a score.
arXiv Detail & Related papers (2020-06-13T15:41:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.