Discordance Minimization-based Imputation Algorithms for Missing Values
in Rating Data
- URL: http://arxiv.org/abs/2311.04035v1
- Date: Tue, 7 Nov 2023 14:42:06 GMT
- Title: Discordance Minimization-based Imputation Algorithms for Missing Values
in Rating Data
- Authors: Young Woong Park, Jinhak Kim, Dan Zhu
- Abstract summary: When multiple rating lists are combined or considered together, subjects often have missing ratings.
We propose analyses on missing value patterns using six real-world data sets in various applications.
We propose optimization models and algorithms that minimize the total rating discordance across rating providers.
- Score: 4.100928307172084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ratings are frequently used to evaluate and compare subjects in various
applications, from education to healthcare, because ratings provide succinct
yet credible measures for comparing subjects. However, when multiple rating
lists are combined or considered together, subjects often have missing ratings,
because most rating lists do not rate every subject in the combined list. In
this study, we propose analyses on missing value patterns using six real-world
data sets in various applications, as well as the conditions for applicability
of imputation algorithms. Based on the special structures and properties
derived from the analyses, we propose optimization models and algorithms that
minimize the total rating discordance across rating providers to impute missing
ratings in the combined rating lists, using only the known rating information.
The total rating discordance is defined as the sum of the pairwise discordance
metric, which can be written as a quadratic function. Computational experiments
based on real-world and synthetic rating data sets show that the proposed
methods outperform the state-of-the-art general imputation methods in the
literature in terms of imputation accuracy.
Related papers
- Ranking evaluation metrics from a group-theoretic perspective [5.333192842860574]
We show instances resulting in inconsistent evaluations, sources of potential mistrust in commonly used metrics.
Our analysis sheds light on ranking evaluation metrics, highlighting that inconsistent evaluations should not be seen as a source of mistrust.
arXiv Detail & Related papers (2024-08-14T09:06:58Z) - Considerations on the Evaluation of Biometric Quality Assessment
Algorithms [7.092869001331781]
Quality assessment algorithms can be used to estimate the utility of a biometric sample for the purpose of biometric recognition.
"Error versus Characteristic" (EDC) plots, and "partial Area Under Curve" (pAUC) values of curves therein, are generally used by researchers to evaluate such quality assessment algorithms.
arXiv Detail & Related papers (2023-03-23T14:26:21Z) - Re-Examining System-Level Correlations of Automatic Summarization
Evaluation Metrics [64.81682222169113]
How reliably an automatic summarization evaluation metric replicates human judgments of summary quality is quantified by system-level correlations.
We identify two ways in which the definition of the system-level correlation is inconsistent with how metrics are used to evaluate systems in practice.
arXiv Detail & Related papers (2022-04-21T15:52:14Z) - Integrating Rankings into Quantized Scores in Peer Review [61.27794774537103]
In peer review, reviewers are usually asked to provide scores for the papers.
To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed.
There are no standard procedure for using this ranking information and Area Chairs may use it in different ways.
We take a principled approach to integrate the ranking information into the scores.
arXiv Detail & Related papers (2022-04-05T19:39:13Z) - Investigating Crowdsourcing Protocols for Evaluating the Factual
Consistency of Summaries [59.27273928454995]
Current pre-trained models applied to summarization are prone to factual inconsistencies which misrepresent the source text or introduce extraneous information.
We create a crowdsourcing evaluation framework for factual consistency using the rating-based Likert scale and ranking-based Best-Worst Scaling protocols.
We find that ranking-based protocols offer a more reliable measure of summary quality across datasets, while the reliability of Likert ratings depends on the target dataset and the evaluation design.
arXiv Detail & Related papers (2021-09-19T19:05:00Z) - Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - An Effectiveness Metric for Ordinal Classification: Formal Properties
and Experimental Results [9.602361044877426]
We propose a new metric for Ordinal Classification, Closeness Evaluation Measure, rooted on Measurement Theory and Information Theory.
Our theoretical analysis and experimental results over both synthetic data and data from NLP shared tasks indicate that the proposed metric captures quality aspects from different traditional tasks simultaneously.
arXiv Detail & Related papers (2020-06-01T20:35:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.