In Automation We Trust: Investigating the Role of Uncertainty in Active
Learning Systems
- URL: http://arxiv.org/abs/2004.00762v1
- Date: Thu, 2 Apr 2020 00:52:49 GMT
- Title: In Automation We Trust: Investigating the Role of Uncertainty in Active
Learning Systems
- Authors: Michael L. Iuzzolino, Tetsumichi Umada, Nisar R. Ahmed, and Danielle
A. Szafir
- Abstract summary: We investigate how different active learning (AL) query policies coupled with classification uncertainty visualizations affect analyst trust in automated classification systems.
We find that query policy significantly influences an analyst's trust in an image classification system.
We propose a set of oracle query policies and visualizations for use during AL training phases that can influence analyst trust in classification.
- Score: 5.459797813771497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate how different active learning (AL) query policies coupled with
classification uncertainty visualizations affect analyst trust in automated
classification systems. A current standard policy for AL is to query the oracle
(e.g., the analyst) to refine labels for datapoints where the classifier has
the highest uncertainty. This is an optimal policy for the automation system as
it yields maximal information gain. However, model-centric policies neglect the
effects of this uncertainty on the human component of the system and the
consequent manner in which the human will interact with the system
post-training. In this paper, we present an empirical study evaluating how AL
query policies and visualizations lending transparency to classification
influence trust in automated classification of image data. We found that query
policy significantly influences an analyst's trust in an image classification
system, and we use these results to propose a set of oracle query policies and
visualizations for use during AL training phases that can influence analyst
trust in classification.
Related papers
- Overcoming Common Flaws in the Evaluation of Selective Classification Systems [3.197540295466042]
We define 5 requirements for multi-threshold metrics in selective classification regarding task alignment, interpretability, and flexibility.
We propose the Area under the Generalized Risk Coverage curve ($mathrmAUGRC$), which meets all requirements and can be directly interpreted as the average risk of undetected failures.
arXiv Detail & Related papers (2024-07-01T07:32:58Z) - Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline
Reinforcement Learning [57.83919813698673]
Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error.
In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
arXiv Detail & Related papers (2023-11-25T00:30:58Z) - Evaluating the Fairness of Discriminative Foundation Models in Computer
Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP)
We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy.
Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z) - Navigating the Pitfalls of Active Learning Evaluation: A Systematic
Framework for Meaningful Performance Assessment [3.3064235071867856]
Active Learning (AL) aims to reduce the labeling burden by interactively selecting the most informative samples from a pool of unlabeled data.
Some studies have questioned the effectiveness of AL compared to emerging paradigms such as semi-supervised (Semi-SL) and self-supervised learning (Self-SL)
arXiv Detail & Related papers (2023-01-25T15:07:44Z) - Using Representation Expressiveness and Learnability to Evaluate
Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability.
CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means.
We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z) - Tradeoffs in Streaming Binary Classification under Limited Inspection
Resources [14.178224954581069]
We consider an imbalanced binary classification problem, where events arrive sequentially and only a limited number of suspicious events can be inspected.
We analytically characterize the tradeoff between the minority-class detection rate and the inspection capacity.
We implement the selection methods on a real public fraud detection dataset and compare the empirical results with analytical bounds.
arXiv Detail & Related papers (2021-10-05T23:23:11Z) - Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification [75.49600684537117]
Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness.
We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability.
Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
arXiv Detail & Related papers (2021-01-18T22:55:40Z) - SelfAugment: Automatic Augmentation Policies for Self-Supervised
Learning [98.2036247050674]
We show that evaluating the learned representations with a self-supervised image rotation task is highly correlated with a standard set of supervised evaluations.
We provide an algorithm (SelfAugment) to automatically and efficiently select augmentation policies without using supervised evaluations.
arXiv Detail & Related papers (2020-09-16T14:49:03Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Policy Entropy for Out-of-Distribution Classification [8.747840760772268]
We propose PEOC, a new policy entropy based out-of-distribution classifier.
It reliably detects unencountered states in deep reinforcement learning.
It is highly competitive against state-of-the-art one-class classification algorithms.
arXiv Detail & Related papers (2020-05-25T12:18:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.