Incorporating Crowdsourced Annotator Distributions into Ensemble
Modeling to Improve Classification Trustworthiness for Ancient Greek Papyri
- URL: http://arxiv.org/abs/2210.16380v4
- Date: Fri, 26 Jan 2024 16:25:43 GMT
- Title: Incorporating Crowdsourced Annotator Distributions into Ensemble
Modeling to Improve Classification Trustworthiness for Ancient Greek Papyri
- Authors: Graham West, Matthew I. Swindall, Ben Keener, Timothy Player, Alex C.
Williams, James H. Brusuelas, John F. Wallin
- Abstract summary: Two issues which complicate the problem on such datasets are class imbalance and ground-truth uncertainty in labeling.
The application of ensemble modeling to such datasets can help identify images where the ground-truth is questionable and quantify the trustworthiness of those samples.
- Score: 3.870354915766567
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Performing classification on noisy, crowdsourced image datasets can prove
challenging even for the best neural networks. Two issues which complicate the
problem on such datasets are class imbalance and ground-truth uncertainty in
labeling. The AL-ALL and AL-PUB datasets - consisting of tightly cropped,
individual characters from images of ancient Greek papyri - are strongly
affected by both issues. The application of ensemble modeling to such datasets
can help identify images where the ground-truth is questionable and quantify
the trustworthiness of those samples. As such, we apply stacked generalization
consisting of nearly identical ResNets with different loss functions: one
utilizing sparse cross-entropy (CXE) and the other Kullback-Liebler Divergence
(KLD). Both networks use labels drawn from a crowd-sourced consensus. This
consensus is derived from a Normalized Distribution of Annotations (NDA) based
on all annotations for a given character in the dataset. For the second
network, the KLD is calculated with respect to the NDA. For our ensemble model,
we apply a k-nearest neighbors model to the outputs of the CXE and KLD
networks. Individually, the ResNet models have approximately 93% accuracy,
while the ensemble model achieves an accuracy of > 95%, increasing the
classification trustworthiness. We also perform an analysis of the Shannon
entropy of the various models' output distributions to measure classification
uncertainty. Our results suggest that entropy is useful for predicting model
misclassifications.
Related papers
- Fair GANs through model rebalancing for extremely imbalanced class
distributions [5.463417677777276]
We present an approach to construct an unbiased generative adversarial network (GAN) from an existing biased GAN.
We show results for the StyleGAN2 models while training on the Flickr Faces High Quality (FFHQ) dataset for racial fairness.
We further validate our approach by applying it to an imbalanced CIFAR10 dataset which is also twice as large.
arXiv Detail & Related papers (2023-08-16T19:20:06Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Understanding Gender and Racial Disparities in Image Recognition Models [0.0]
We investigate a multi-label softmax loss with cross-entropy as the loss function instead of a binary cross-entropy on a multi-label classification problem.
We use the MR2 dataset to evaluate the fairness in the model outcomes and try to interpret the mistakes by looking at model activations and suggest possible fixes.
arXiv Detail & Related papers (2021-07-20T01:05:31Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - GraphAnoGAN: Detecting Anomalous Snapshots from Attributed Graphs [36.00861758441135]
We propose GraphAnoGAN, an anomalous snapshot ranking framework.
It consists of two core components -- generative and discriminative models.
Experiments on 4 real-world networks show that GraphAnoGAN outperforms 6 baselines with a significant margin.
arXiv Detail & Related papers (2021-06-29T15:35:37Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Understanding Anomaly Detection with Deep Invertible Networks through
Hierarchies of Distributions and Features [4.25227087152716]
Convolutional networks learn similar low-level feature distributions when trained on any natural image dataset.
When the discriminative features between inliers and outliers are on a high-level, anomaly detection becomes particularly challenging.
We propose two methods to remove the negative impact of model bias and domain prior on detecting high-level differences.
arXiv Detail & Related papers (2020-06-18T20:56:14Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.