Information-theoretic Classification Accuracy: A Criterion that Guides
Data-driven Combination of Ambiguous Outcome Labels in Multi-class
Classification
- URL: http://arxiv.org/abs/2109.00582v1
- Date: Wed, 1 Sep 2021 19:20:28 GMT
- Title: Information-theoretic Classification Accuracy: A Criterion that Guides
Data-driven Combination of Ambiguous Outcome Labels in Multi-class
Classification
- Authors: Chihao Zhang, Yiling Elaine Chen, Shihua Zhang, Jingyi Jessica Li
- Abstract summary: Outcome labeling ambiguity and subjectivity are ubiquitous in real-world datasets.
We propose the information-theoretic classification accuracy (ITCA) to guide practitioners on how to combine ambiguous outcome labels.
We demonstrate the effectiveness of ITCA in diverse applications including medical prognosis, cancer survival prediction, user demographics prediction, and cell type classification.
- Score: 3.9533511130413137
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Outcome labeling ambiguity and subjectivity are ubiquitous in real-world
datasets. While practitioners commonly combine ambiguous outcome labels in an
ad hoc way to improve the accuracy of multi-class classification, there lacks a
principled approach to guide label combination by any optimality criterion. To
address this problem, we propose the information-theoretic classification
accuracy (ITCA), a criterion of outcome "information" conditional on outcome
prediction, to guide practitioners on how to combine ambiguous outcome labels.
ITCA indicates a balance in the trade-off between prediction accuracy (how well
do predicted labels agree with actual labels) and prediction resolution (how
many labels are predictable). To find the optimal label combination indicated
by ITCA, we develop two search strategies: greedy search and breadth-first
search. Notably, ITCA and the two search strategies are adaptive to all
machine-learning classification algorithms. Coupled with a classification
algorithm and a search strategy, ITCA has two uses: to improve prediction
accuracy and to identify ambiguous labels. We first verify that ITCA achieves
high accuracy with both search strategies in finding the correct label
combinations on synthetic and real data. Then we demonstrate the effectiveness
of ITCA in diverse applications including medical prognosis, cancer survival
prediction, user demographics prediction, and cell type classification.
Related papers
- A Debiased Nearest Neighbors Framework for Multi-Label Text Classification [13.30576550077694]
We introduce a DEbiased Nearest Neighbors (DENN) framework for Multi-Label Text Classification (MLTC)
To address embedding alignment bias, we propose a debiased contrastive learning strategy, enhancing neighbor consistency on label co-occurrence.
For confidence estimation bias, we present a debiased confidence estimation strategy, improving the adaptive combination of predictions from $k$NN and inductive binary classifications.
arXiv Detail & Related papers (2024-08-06T14:00:23Z) - Generating Unbiased Pseudo-labels via a Theoretically Guaranteed
Chebyshev Constraint to Unify Semi-supervised Classification and Regression [57.17120203327993]
threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label.
In nature, regression also requires unbiased methods to generate high-quality labels.
We propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality.
arXiv Detail & Related papers (2023-11-03T08:39:35Z) - Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection [98.66771688028426]
We propose a Ambiguity-Resistant Semi-supervised Learning (ARSL) for one-stage detectors.
Joint-Confidence Estimation (JCE) is proposed to quantifies the classification and localization quality of pseudo labels.
ARSL effectively mitigates the ambiguities and achieves state-of-the-art SSOD performance on MS COCO and PASCAL VOC.
arXiv Detail & Related papers (2023-03-27T07:46:58Z) - Enhancing Label Correlation Feedback in Multi-Label Text Classification
via Multi-Task Learning [6.1538971100140145]
We introduce a novel approach with multi-task learning to enhance label correlation feedback.
We propose two auxiliary label co-occurrence prediction tasks to enhance label correlation learning.
arXiv Detail & Related papers (2021-06-06T12:26:14Z) - Predictive K-means with local models [0.028675177318965035]
Predictive clustering seeks to obtain the best of the two worlds.
We present two new algorithms using this technique and show on a variety of data sets that they are competitive for prediction performance.
arXiv Detail & Related papers (2020-12-16T10:49:36Z) - Unsupervised Label Refinement Improves Dataless Text Classification [48.031421660674745]
Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description.
While promising, it crucially relies on accurate descriptions of the label set for each downstream task.
This reliance causes dataless classifiers to be highly sensitive to the choice of label descriptions and hinders the broader application of dataless classification in practice.
arXiv Detail & Related papers (2020-12-08T03:37:50Z) - SPL-MLL: Selecting Predictable Landmarks for Multi-Label Learning [87.27700889147144]
We propose to select a small subset of labels as landmarks which are easy to predict according to input (predictable) and can well recover the other possible labels (representative)
We employ the Alternating Direction Method (ADM) to solve our problem. Empirical studies on real-world datasets show that our method achieves superior classification performance over other state-of-the-art methods.
arXiv Detail & Related papers (2020-08-16T11:07:44Z) - Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive
Person Re-Identification [64.37745443119942]
This paper jointly enforces visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification.
Experimental results on three large-scale ReID datasets demonstrate the superiority of proposed method in both unsupervised and unsupervised domain adaptive ReID tasks.
arXiv Detail & Related papers (2020-07-21T14:31:27Z) - Global Multiclass Classification and Dataset Construction via
Heterogeneous Local Experts [37.27708297562079]
We show how to minimize the number of labelers while ensuring the reliability of the resulting dataset.
Experiments with the MNIST and CIFAR-10 datasets demonstrate the favorable accuracy of our aggregation scheme.
arXiv Detail & Related papers (2020-05-21T18:07:42Z) - Interaction Matching for Long-Tail Multi-Label Classification [57.262792333593644]
We present an elegant and effective approach for addressing limitations in existing multi-label classification models.
By performing soft n-gram interaction matching, we match labels with natural language descriptions.
arXiv Detail & Related papers (2020-05-18T15:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.