A Bayesian Active Learning Approach to Comparative Judgement
- URL: http://arxiv.org/abs/2308.13292v1
- Date: Fri, 25 Aug 2023 10:33:44 GMT
- Title: A Bayesian Active Learning Approach to Comparative Judgement
- Authors: Andy Gray, Alma Rahat, Tom Crick, Stephen Lindsay, Darren Wallace
- Abstract summary: Traditional marking is a source of inconsistencies and unconscious bias, placing a high cognitive load on the assessor.
In CJ, the assessor is presented with a pair of items and is asked to select the better one.
While CJ is considered a reliable method for marking, there are concerns around transparency.
We propose a novel Bayesian approach to CJ (BCJ) for determining the ranks of compared items.
- Score: 3.0098452499209705
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Assessment is a crucial part of education. Traditional marking is a source of
inconsistencies and unconscious bias, placing a high cognitive load on the
assessors. An approach to address these issues is comparative judgement (CJ).
In CJ, the assessor is presented with a pair of items and is asked to select
the better one. Following a series of comparisons, a rank is derived using a
ranking model, for example, the BTM, based on the results. While CJ is
considered a reliable method for marking, there are concerns around
transparency, and the ideal number of pairwise comparisons to generate a
reliable estimation of the rank order is not known. Additionally, there have
been attempts to generate a method of selecting pairs that should be compared
next in an informative manner, but some existing methods are known to have
created their own bias within results inflating the reliability metric used. As
a result, a random selection approach is usually deployed.
We propose a novel Bayesian approach to CJ (BCJ) for determining the ranks of
compared items alongside a new way to select the pairs to present to the
marker(s) using active learning (AL), addressing the key shortcomings of
traditional CJ. Furthermore, we demonstrate how the entire approach may provide
transparency by providing the user insights into how it is making its decisions
and, at the same time, being more efficient. Results from our experiments
confirm that the proposed BCJ combined with entropy-driven AL pair-selection
method is superior to other alternatives. We also find that the more
comparisons done, the more accurate BCJ becomes, which solves the issue the
current method has of the model deteriorating if too many comparisons are
performed. As our approach can generate the complete predicted rank
distribution for an item, we also show how this can be utilised in devising a
predicted grade, guided by the assessor.
Related papers
- Bayesian Active Learning for Multi-Criteria Comparative Judgement in Educational Assessment [3.0098452499209705]
Comparative Judgement (CJ) provides an alternative assessment approach by evaluating work holistically rather than breaking it into discrete criteria.
This method leverages human ability to make nuanced comparisons, yielding more reliable and valid assessments.
rubrics remain widely used in education, offering structured criteria for grading and detailed feedback.
This creates a gap between CJ's holistic ranking and the need for criterion-based performance breakdowns.
arXiv Detail & Related papers (2025-03-01T13:12:41Z) - Federated Learning with Discriminative Naive Bayes Classifier [0.6574756524825567]
Federated learning has emerged as a promising approach to train machine learning models on decentralized data sources.
This paper proposes a new federated approach for Naive Bayes (NB) classification, assuming discrete variables.
Our approach federates a discriminative variant of NB, sharing meaningless parameters instead of conditional probability tables.
arXiv Detail & Related papers (2025-02-03T17:12:02Z) - Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML [9.579645248339004]
We show significant variance in fairness achieved by several algorithms and the influence of the learning pipeline on fairness scores.
We highlight that most bias mitigation techniques can achieve comparable performance.
We hope our work encourages future research on how various choices in the lifecycle of developing an algorithm impact fairness.
arXiv Detail & Related papers (2024-11-17T15:17:08Z) - Efficient Pointwise-Pairwise Learning-to-Rank for News Recommendation [6.979979613916754]
News recommendation is a challenging task that involves personalization based on the interaction history and preferences of each user.
Recent works have leveraged the power of pretrained language models (PLMs) to directly rank news items by using inference approaches that predominately fall into three categories: pointwise, pairwise, and listwise learning-to-rank.
We propose a novel framework for PLM-based news recommendation that integrates both pointwise relevance prediction and pairwise comparisons in a scalable manner.
arXiv Detail & Related papers (2024-09-26T10:27:19Z) - Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment [54.179859639868646]
We propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking.
xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics.
We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories.
arXiv Detail & Related papers (2023-07-27T07:42:44Z) - Crowdsourcing subjective annotations using pairwise comparisons reduces
bias and error compared to the majority-vote method [0.0]
We introduce a theoretical framework for understanding how random error and measurement bias enter into crowdsourced annotations of subjective constructs.
We then propose a pipeline that combines pairwise comparison labelling with Elo scoring, and demonstrate that it outperforms the ubiquitous majority-voting method in reducing both types of measurement error.
arXiv Detail & Related papers (2023-05-31T17:14:12Z) - Recommendation Systems with Distribution-Free Reliability Guarantees [83.80644194980042]
We show how to return a set of items rigorously guaranteed to contain mostly good items.
Our procedure endows any ranking model with rigorous finite-sample control of the false discovery rate.
We evaluate our methods on the Yahoo! Learning to Rank and MSMarco datasets.
arXiv Detail & Related papers (2022-07-04T17:49:25Z) - Unbiased Pairwise Learning to Rank in Recommender Systems [4.058828240864671]
Unbiased learning to rank algorithms are appealing candidates and have already been applied in many applications with single categorical labels.
We propose a novel unbiased LTR algorithm to tackle the challenges, which innovatively models position bias in the pairwise fashion.
Experiment results on public benchmark datasets and internal live traffic show the superior results of the proposed method for both categorical and continuous labels.
arXiv Detail & Related papers (2021-11-25T06:04:59Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - Taking the Counterfactual Online: Efficient and Unbiased Online
Evaluation for Ranking [74.46448041224247]
We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimize the policy for logging data.
LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display.
We prove that, as an online evaluation method, LogOpt is unbiased w.r.t. position and item-selection bias, unlike existing interleaving methods.
arXiv Detail & Related papers (2020-07-24T18:05:58Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - SetRank: A Setwise Bayesian Approach for Collaborative Ranking from
Implicit Feedback [50.13745601531148]
We propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to accommodate the characteristics of implicit feedback in recommender system.
Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons.
We also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $sqrtM/N$.
arXiv Detail & Related papers (2020-02-23T06:40:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.