Related papers: Expert load matters: operating networks at high accuracy and low manual effort

Expert load matters: operating networks at high accuracy and low manual effort

URL: http://arxiv.org/abs/2308.05035v2
Date: Wed, 11 Oct 2023 13:48:38 GMT
Title: Expert load matters: operating networks at high accuracy and low manual effort
Authors: Sara Sangalli, Ertunc Erdil, Ender Konukoglu
Abstract summary: We argue that deep neural networks should be trained by taking into account both accuracy and expert load. We propose a new complementary loss function for classification that maximizes the area under this COC curve. Our results demonstrate that the proposed loss improves classification accuracy and delegates less number of decisions to experts.
Score: 14.978358577277028
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In human-AI collaboration systems for critical applications, in order to ensure minimal error, users should set an operating point based on model confidence to determine when the decision should be delegated to human experts. Samples for which model confidence is lower than the operating point would be manually analysed by experts to avoid mistakes. Such systems can become truly useful only if they consider two aspects: models should be confident only for samples for which they are accurate, and the number of samples delegated to experts should be minimized. The latter aspect is especially crucial for applications where available expert time is limited and expensive, such as healthcare. The trade-off between the model accuracy and the number of samples delegated to experts can be represented by a curve that is similar to an ROC curve, which we refer to as confidence operating characteristic (COC) curve. In this paper, we argue that deep neural networks should be trained by taking into account both accuracy and expert load and, to that end, propose a new complementary loss function for classification that maximizes the area under this COC curve. This promotes simultaneously the increase in network accuracy and the reduction in number of samples delegated to humans. We perform experiments on multiple computer vision and medical image datasets for classification. Our results demonstrate that the proposed loss improves classification accuracy and delegates less number of decisions to experts, achieves better out-of-distribution samples detection and on par calibration performance compared to existing loss functions.

Related papers

The Statistical Fairness-Accuracy Frontier [50.323024516295725]
Machine learning models must balance accuracy and fairness, but these goals often conflict.<n>A useful tool for understanding this trade-off is the fairness-accuracy frontier, which characterizes the set of models that cannot be simultaneously improved in both fairness and accuracy.<n>We study the FA frontier in the finite-sample regime, showing how it deviates from its population counterpart and quantifying the worst-case gap between them.
arXiv Detail & Related papers (2025-08-25T03:01:35Z)
On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts [66.39976432286905]
We study the convergence rates of the maximum likelihood estimator of gating and prompt parameters.<n>We find that the estimability of these parameters is compromised when the prompt acquires overlapping knowledge with the pre-trained model.
arXiv Detail & Related papers (2025-05-24T01:30:46Z)
Estimating Uncertainty with Implicit Quantile Network [0.0]
Uncertainty quantification is an important part of many performance critical applications. This paper provides a simple alternative to existing approaches such as ensemble learning and bayesian neural networks.
arXiv Detail & Related papers (2024-08-26T13:33:14Z)
Refining Tuberculosis Detection in CXR Imaging: Addressing Bias in Deep Neural Networks via Interpretability [1.9936075659851882]
We argue that the reliability of deep learning models is limited, even if they can be shown to obtain perfect classification accuracy on the test data. We show that pre-training a deep neural network on a large-scale proxy task, as well as using mixed objective optimization network (MOON), can improve the alignment of decision foundations between models and experts.
arXiv Detail & Related papers (2024-07-19T06:41:31Z)
Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative. We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z)
Incorporating Experts' Judgment into Machine Learning Models [2.5363839239628843]
In some cases, domain experts might have a judgment about the expected outcome that might conflict with the prediction of machine learning models. We present a novel framework that aims at leveraging experts' judgment to mitigate the conflict.
arXiv Detail & Related papers (2023-04-24T07:32:49Z)
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z)
Beyond calibration: estimating the grouping loss of modern neural networks [68.8204255655161]
Proper scoring rule theory shows that given the calibration loss, the missing piece to characterize individual errors is the grouping loss. We show that modern neural network architectures in vision and NLP exhibit grouping loss, notably in distribution shifts settings.
arXiv Detail & Related papers (2022-10-28T07:04:20Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Trustworthy Long-Tailed Classification [41.45744960383575]
We propose a Trustworthy Long-tailed Classification (TLC) method to jointly conduct classification and uncertainty estimation. Our TLC obtains the evidence-based uncertainty (EvU) and evidence for each expert, and then combines these uncertainties and evidences under the Dempster-Shafer Evidence Theory (DST) The experimental results show that the proposed TLC outperforms the state-of-the-art methods and is trustworthy with reliable uncertainty.
arXiv Detail & Related papers (2021-11-17T10:52:36Z)
$f$-Cal: Calibrated aleatoric uncertainty estimation from neural networks for robot perception [9.425514903472545]
Existing approaches estimate uncertainty from neural network perception stacks by modifying network architectures, inference procedure, or loss functions. Our key insight is that calibration is only achieved by imposing constraints across multiple examples, such as those in a mini-batch. By enforcing the distribution of outputs of a neural network to resemble a target distribution by minimizing an $f$-divergence, we obtain significantly better-calibrated models compared to prior approaches.
arXiv Detail & Related papers (2021-09-28T17:57:58Z)
Accurate and Robust Feature Importance Estimation under Distribution Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method. We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
An Uncertainty-based Human-in-the-loop System for Industrial Tool Wear Analysis [68.8204255655161]
We show that uncertainty measures based on Monte-Carlo dropout in the context of a human-in-the-loop system increase the system's transparency and performance. A simulation study demonstrates that the uncertainty-based human-in-the-loop system increases performance for different levels of human involvement.
arXiv Detail & Related papers (2020-07-14T15:47:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.