Related papers: Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems

Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems

URL: http://arxiv.org/abs/2209.10058v1
Date: Wed, 21 Sep 2022 01:06:30 GMT
Title: Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems
Authors: Jirong Yi, Qiaosheng Zhang, Zhen Chen, Qiao Liu, Wei Shao
Abstract summary: We show that the existing cross entropy loss minimization problem essentially learns the label conditional entropy of the underlying data distribution. We propose a mutual information learning framework where we train deep neural network classifiers via learning the mutual information between the label and the input.
Score: 9.660129425150926
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning systems have been reported to achieve state-of-the-art performances in many applications, and a key is the existence of well trained classifiers on benchmark datasets. As a main-stream loss function, the cross entropy can easily lead us to find models which demonstrate severe overfitting behavior. In this paper, we show that the existing cross entropy loss minimization problem essentially learns the label conditional entropy (CE) of the underlying data distribution of the dataset. However, the CE learned in this way does not characterize well the information shared by the label and the input. In this paper, we propose a mutual information learning framework where we train deep neural network classifiers via learning the mutual information between the label and the input. Theoretically, we give the population classification error lower bound in terms of the mutual information. In addition, we derive the mutual information lower and upper bounds for a concrete binary classification data model in $\mathbb{R}^n$, and also the error probability lower bound in this scenario. Empirically, we conduct extensive experiments on several benchmark datasets to support our theory. The mutual information learned classifiers (MILCs) achieve far better generalization performances than the conditional entropy learned classifiers (CELCs) with an improvement which can exceed more than 10\% in testing accuracy.

Related papers

Learning A Disentangling Representation For PU Learning [18.94726971543125]
We propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two clusters. We conduct experiments on simulated PU data that demonstrate the improved performance of our proposed method compared to the current state-of-the-art approaches.
arXiv Detail & Related papers (2023-10-05T18:33:32Z)
Probabilistic Safety Regions Via Finite Families of Scalable Classifiers [2.431537995108158]
Supervised classification recognizes patterns in the data to separate classes of behaviours. Canonical solutions contain misclassification errors that are intrinsic to the numerical approximating nature of machine learning. We introduce the concept of probabilistic safety region to describe a subset of the input space in which the number of misclassified instances is probabilistically controlled.
arXiv Detail & Related papers (2023-09-08T22:40:19Z)
Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems [9.660129425150926]
Cross entropy loss can easily lead us to find models which demonstrate severe overfitting behavior. In this paper, we prove that the existing cross entropy loss minimization for training DNN classifiers essentially learns the conditional entropy of the underlying data distribution. We propose a mutual information learning framework where we train DNN classifiers via learning the mutual information between the label and input.
arXiv Detail & Related papers (2022-10-03T15:09:19Z)
A Survey of Learning on Small Data: Generalization, Optimization, and Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI. This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data. Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z)
Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV) NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z)
No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data. We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model. Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
Few-Shot Incremental Learning with Continually Evolved Classifiers [46.278573301326276]
Few-shot class-incremental learning (FSCIL) aims to design machine learning algorithms that can continually learn new concepts from a few data points. The difficulty lies in that limited data from new classes not only lead to significant overfitting issues but also exacerbate the notorious catastrophic forgetting problems. We propose a Continually Evolved CIF ( CEC) that employs a graph model to propagate context information between classifiers for adaptation.
arXiv Detail & Related papers (2021-04-07T10:54:51Z)
ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications. We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN) We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z)
Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning. The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
Identifying and Compensating for Feature Deviation in Imbalanced Deep Learning [59.65752299209042]
We investigate learning a ConvNet under such a scenario. We found that a ConvNet significantly over-fits the minor classes. We propose to incorporate class-dependent temperatures (CDT) training ConvNet.
arXiv Detail & Related papers (2020-01-06T03:52:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.