Cross-replication Reliability -- An Empirical Approach to Interpreting
Inter-rater Reliability
- URL: http://arxiv.org/abs/2106.07393v1
- Date: Fri, 11 Jun 2021 16:15:46 GMT
- Title: Cross-replication Reliability -- An Empirical Approach to Interpreting
Inter-rater Reliability
- Authors: Ka Wong, Praveen Paritosh, Lora Aroyo
- Abstract summary: We present a new approach to interpreting IRR that is empirical and contextualized.
It is based upon benchmarking IRR against baseline measures in a replication, one of which is a novel cross-replication reliability (xRR) measure based on Cohen's kappa.
- Score: 2.2091544233596596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a new approach to interpreting IRR that is empirical and
contextualized. It is based upon benchmarking IRR against baseline measures in
a replication, one of which is a novel cross-replication reliability (xRR)
measure based on Cohen's kappa. We call this approach the xRR framework. We
opensource a replication dataset of 4 million human judgements of facial
expressions and analyze it with the proposed framework. We argue this framework
can be used to measure the quality of crowdsourced datasets.
Related papers
- Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification [120.37051160567277]
This paper proposes a novel measure named Top-K Pairwise Ranking (TKPR)
A series of analyses show that TKPR is compatible with existing ranking-based measures.
On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction.
arXiv Detail & Related papers (2024-07-09T09:36:37Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Bounding data reconstruction attacks with the hypothesis testing
interpretation of differential privacy [78.32404878825845]
Reconstruction Robustness (ReRo) was recently proposed as an upper bound on the success of data reconstruction attacks against machine learning models.
Previous research has demonstrated that differential privacy (DP) mechanisms also provide ReRo, but so far, only Monte Carlo estimates of a tight ReRo bound have been shown.
arXiv Detail & Related papers (2023-07-08T08:02:47Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality
Perspective [5.8010446129208155]
This study scrutinizes the dependability of the RemOve-And-Retrain (ROAR) procedure, which is prevalently employed for gauging the performance of feature importance estimates.
The insights gleaned from our theoretical foundation and empirical investigations reveal that attributions containing lesser information about the decision function may yield superior results in ROAR benchmarks.
arXiv Detail & Related papers (2023-04-26T21:43:42Z) - Factual Consistency Oriented Speech Recognition [23.754107608608106]
The proposed framework optimize the ASR model to maximize an expected factual consistency score between ASR hypotheses and ground-truth transcriptions.
It is shown that training the ASR models with the proposed framework improves the speech summarization quality as measured by the factual consistency of meeting conversation summaries.
arXiv Detail & Related papers (2023-02-24T00:01:41Z) - Improved Policy Evaluation for Randomized Trials of Algorithmic Resource
Allocation [54.72195809248172]
We present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.
We prove theoretically that such an estimator is more accurate than common estimators based on sample means.
arXiv Detail & Related papers (2023-02-06T05:17:22Z) - Interpretable Research Replication Prediction via Variational Contextual
Consistency Sentence Masking [14.50690911709558]
Research Replication Prediction (RRP) is the task of predicting whether a published research result can be replicated or not.
In this work, we propose the Variational Contextual Consistency Sentence Masking (VCCSM) method to automatically extract key sentences.
Results of our experiments on RRP along with European Convention of Human Rights (ECHR) datasets demonstrate that VCCSM is able to improve the model interpretability for the long document classification tasks.
arXiv Detail & Related papers (2022-03-28T03:27:13Z) - k-Rater Reliability: The Correct Unit of Reliability for Aggregated
Human Annotations [2.538209532048867]
A proposed k-rater reliability (kRR) should be used as the correct data reliability for aggregated datasets.
We present empirical, analytical, and bootstrap-based methods for computing kRR on WordSim-353.
arXiv Detail & Related papers (2022-03-24T08:05:06Z) - Distributionally Robust Multi-Output Regression Ranking [3.9318191265352196]
We introduce a new listwise listwise learning-to-rank model called Distributionally Robust Multi-output Regression Ranking (DRMRR)
DRMRR uses a Distributionally Robust Optimization framework to minimize a multi-output loss function under the most adverse distributions in the neighborhood of the empirical data distribution.
Our experiments were conducted on two real-world applications, medical document retrieval, and drug response prediction.
arXiv Detail & Related papers (2021-09-27T05:19:27Z) - Federated Distributionally Robust Optimization for Phase Configuration
of RISs [106.4688072667105]
We study the problem of robust reconfigurable intelligent surface (RIS)-aided downlink communication over heterogeneous RIS types in a supervised learning setting.
By modeling downlink communication over heterogeneous RIS designs as different workers that learn how to optimize phase configurations in a distributed manner, we solve this distributed learning problem.
Our proposed algorithm requires fewer communication rounds to achieve the same worst-case distribution test accuracy compared to competitive baselines.
arXiv Detail & Related papers (2021-08-20T07:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.