User Experience Design for Automatic Credibility Assessment of News
Content About COVID-19
- URL: http://arxiv.org/abs/2204.13943v1
- Date: Fri, 29 Apr 2022 08:38:45 GMT
- Title: User Experience Design for Automatic Credibility Assessment of News
Content About COVID-19
- Authors: Konstantin Schulz, Jens Rauenbusch, Jan Fillies, Lisa Rutenburg,
Dimitrios Karvelas, Georg Rehm
- Abstract summary: We present two empirical studies to evaluate the usability of graphical interfaces that offer credibility assessment.
Rating scale, sub-criteria and algorithm authorship are important predictors of the usability.
The authorship of a news text is more important than the authorship of the credibility algorithm used to assess the content quality.
- Score: 0.33262200259340124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasingly rapid spread of information about COVID-19 on the web calls
for automatic measures of quality assurance. In that context, we check the
credibility of news content using selected linguistic features. We present two
empirical studies to evaluate the usability of graphical interfaces that offer
such credibility assessment. In a moderated qualitative interview with six
participants, we identify rating scale, sub-criteria and algorithm authorship
as important predictors of the usability. A subsequent quantitative online
survey with 50 participants reveals a conflict between transparency and
conciseness in the interface design, as well as a perceived hierarchy of
metadata: the authorship of a news text is more important than the authorship
of the credibility algorithm used to assess the content quality. Finally, we
make suggestions for future research, such as proactively documenting
credibility-related metadata for Natural Language Processing and Language
Technology services and establishing an explicit hierarchical taxonomy of
usability predictors for automatic credibility assessment.
Related papers
- A Survey on Automatic Credibility Assessment of Textual Credibility Signals in the Era of Large Language Models [6.538395325419292]
Credibility assessment is fundamentally based on aggregating credibility signals.
Credibility signals provide a more granular, more easily explainable and widely utilizable information.
A growing body of research on automatic credibility assessment and detection of credibility signals can be characterized as highly fragmented and lacking mutual interconnections.
arXiv Detail & Related papers (2024-10-28T17:51:08Z) - Multi-Facet Counterfactual Learning for Content Quality Evaluation [48.73583736357489]
We propose a framework for efficiently constructing evaluators that perceive multiple facets of content quality evaluation.
We leverage a joint training strategy based on contrastive learning and supervised learning to enable the evaluator to distinguish between different quality facets.
arXiv Detail & Related papers (2024-10-10T08:04:10Z) - ARTICLE: Annotator Reliability Through In-Context Learning [18.818071256242327]
We propose texttARTICLE, an in-context learning framework to estimate annotation quality through self-consistency.
Our findings indicate that texttARTICLE can be used as a robust method for identifying reliable annotators, hence improving data quality.
arXiv Detail & Related papers (2024-09-18T17:59:32Z) - DePrompt: Desensitization and Evaluation of Personal Identifiable Information in Large Language Model Prompts [11.883785681042593]
DePrompt is a desensitization protection and effectiveness evaluation framework for prompt.
We integrate contextual attributes to define privacy types, achieving high-precision PII entity identification.
Our framework is adaptable to prompts and can be extended to text usability-dependent scenarios.
arXiv Detail & Related papers (2024-08-16T02:38:25Z) - Holistic Evaluation for Interleaved Text-and-Image Generation [19.041251355695973]
We introduce InterleavedBench, the first benchmark carefully curated for the evaluation of interleaved text-and-image generation.
In addition, we present InterleavedEval, a strong reference-free metric powered by GPT-4o to deliver accurate and explainable evaluation.
arXiv Detail & Related papers (2024-06-20T18:07:19Z) - Exploring the Use of Large Language Models for Reference-Free Text
Quality Evaluation: An Empirical Study [63.27346930921658]
ChatGPT is capable of evaluating text quality effectively from various perspectives without reference.
The Explicit Score, which utilizes ChatGPT to generate a numeric score measuring text quality, is the most effective and reliable method among the three exploited approaches.
arXiv Detail & Related papers (2023-04-03T05:29:58Z) - Evaluating and Improving Factuality in Multimodal Abstractive
Summarization [91.46015013816083]
We propose CLIPBERTScore to leverage the robustness and strong factuality detection performance between image-summary and document-summary.
We show that this simple combination of two metrics in the zero-shot achieves higher correlations than existing factuality metrics for document summarization.
Our analysis demonstrates the robustness and high correlation of CLIPBERTScore and its components on four factuality metric-evaluation benchmarks.
arXiv Detail & Related papers (2022-11-04T16:50:40Z) - Investigating Crowdsourcing Protocols for Evaluating the Factual
Consistency of Summaries [59.27273928454995]
Current pre-trained models applied to summarization are prone to factual inconsistencies which misrepresent the source text or introduce extraneous information.
We create a crowdsourcing evaluation framework for factual consistency using the rating-based Likert scale and ranking-based Best-Worst Scaling protocols.
We find that ranking-based protocols offer a more reliable measure of summary quality across datasets, while the reliability of Likert ratings depends on the target dataset and the evaluation design.
arXiv Detail & Related papers (2021-09-19T19:05:00Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z) - GO FIGURE: A Meta Evaluation of Factuality in Summarization [131.1087461486504]
We introduce GO FIGURE, a meta-evaluation framework for evaluating factuality evaluation metrics.
Our benchmark analysis on ten factuality metrics reveals that our framework provides a robust and efficient evaluation.
It also reveals that while QA metrics generally improve over standard metrics that measure factuality across domains, performance is highly dependent on the way in which questions are generated.
arXiv Detail & Related papers (2020-10-24T08:30:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.