A Unified Statistical Learning Model for Rankings and Scores with
Application to Grant Panel Review
- URL: http://arxiv.org/abs/2201.02539v1
- Date: Fri, 7 Jan 2022 16:56:52 GMT
- Title: A Unified Statistical Learning Model for Rankings and Scores with
Application to Grant Panel Review
- Authors: Michael Pearce and Elena A. Erosheva
- Abstract summary: Rankings and scores are two common data types used by judges to express preferences and/or perceptions of quality in a collection of objects.
Numerous models exist to study data of each type separately, but no unified statistical model captures both data types simultaneously.
We propose the Mallows-Binomial model to close this gap, which combines a Mallows' $phi$ ranking model with Binomial score models.
- Score: 1.240096657086732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rankings and scores are two common data types used by judges to express
preferences and/or perceptions of quality in a collection of objects. Numerous
models exist to study data of each type separately, but no unified statistical
model captures both data types simultaneously without first performing data
conversion. We propose the Mallows-Binomial model to close this gap, which
combines a Mallows' $\phi$ ranking model with Binomial score models through
shared parameters that quantify object quality, a consensus ranking, and the
level of consensus between judges. We propose an efficient tree-search
algorithm to calculate the exact MLE of model parameters, study statistical
properties of the model both analytically and through simulation, and apply our
model to real data from an instance of grant panel review that collected both
scores and partial rankings. Furthermore, we demonstrate how model outputs can
be used to rank objects with confidence. The proposed model is shown to
sensibly combine information from both scores and rankings to quantify object
quality and measure consensus with appropriate levels of statistical
uncertainty.
Related papers
- On Evaluation of Vision Datasets and Models using Human Competency Frameworks [20.802372291783488]
Item Response Theory (IRT) is a framework that infers interpretable latent parameters for an ensemble of models and each dataset item.
We assess model calibration, select informative data subsets, and demonstrate the usefulness of its latent parameters for analyzing and comparing models and datasets in computer vision.
arXiv Detail & Related papers (2024-09-06T06:20:11Z) - Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification [3.1850615666574806]
This study investigates how consistent different metrics are at evaluating models across data of different prevalence.
I find that evaluation metrics that are less influenced by prevalence offer more consistent evaluation of individual models and more consistent ranking of a set of models.
arXiv Detail & Related papers (2024-08-19T17:52:38Z) - Anchor Points: Benchmarking Models with Much Fewer Examples [88.02417913161356]
In six popular language classification benchmarks, model confidence in the correct class on many pairs of points is strongly correlated across models.
We propose Anchor Point Selection, a technique to select small subsets of datasets that capture model behavior across the entire dataset.
Just several anchor points can be used to estimate model per-class predictions on all other points in a dataset with low mean absolute error.
arXiv Detail & Related papers (2023-09-14T17:45:51Z) - A Unified Interactive Model Evaluation for Classification, Object
Detection, and Instance Segmentation in Computer Vision [31.441561710096877]
We develop an open-source visual analysis tool, Uni-Evaluator, to support a unified model evaluation for classification, object detection, and instance segmentation in computer vision.
The key idea behind our method is to formulate both discrete and continuous predictions in different tasks as unified probability distributions.
Based on these distributions, we develop 1) a matrix-based visualization to provide an overview of model performance; 2) a table visualization to identify the problematic data subsets where the model performs poorly; and 3) a grid visualization to display the samples of interest.
arXiv Detail & Related papers (2023-08-09T18:11:28Z) - Universal Semi-supervised Model Adaptation via Collaborative Consistency
Training [92.52892510093037]
We introduce a realistic and challenging domain adaptation problem called Universal Semi-supervised Model Adaptation (USMA)
We propose a collaborative consistency training framework that regularizes the prediction consistency between two models.
Experimental results demonstrate the effectiveness of our method on several benchmark datasets.
arXiv Detail & Related papers (2023-07-07T08:19:40Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - Statistical Model Criticism of Variational Auto-Encoders [15.005894753472894]
We propose a framework for the statistical evaluation of variational auto-encoders (VAEs)
We test two instances of this framework in the context of modelling images of handwritten digits and a corpus of English text.
arXiv Detail & Related papers (2022-04-06T18:19:29Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously.
We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework.
The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.