Related papers: Conditional Mutual Information Constrained Deep Learning for Classification

Conditional Mutual Information Constrained Deep Learning for Classification

URL: http://arxiv.org/abs/2309.09123v1
Date: Sun, 17 Sep 2023 01:16:45 GMT
Title: Conditional Mutual Information Constrained Deep Learning for Classification
Authors: En-Hui Yang, Shayan Mohajer Hamidi, Linfeng Ye, Renhao Tan and Beverly Yang
Abstract summary: conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and performance of a classification deep neural network (DNN) By using NCMI to evaluate popular DNNs pretrained over ImageNet in the literature, it is shown that their validation accuracies over ImageNet validation data set are more or less inversely proportional to their NCMI values. A novel alternating learning algorithm is proposed to solve such a constrained optimization problem.
Score: 3.5237980787861964
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the DNN, respectively. By using NCMI to evaluate popular DNNs pretrained over ImageNet in the literature, it is shown that their validation accuracies over ImageNet validation data set are more or less inversely proportional to their NCMI values. Based on this observation, the standard deep learning (DL) framework is further modified to minimize the standard cross entropy function subject to an NCMI constraint, yielding CMI constrained deep learning (CMIC-DL). A novel alternating learning algorithm is proposed to solve such a constrained optimization problem. Extensive experiment results show that DNNs trained within CMIC-DL outperform the state-of-the-art models trained within the standard DL and other loss functions in the literature in terms of both accuracy and robustness against adversarial attacks. In addition, visualizing the evolution of learning process through the lens of CMI and NCMI is also advocated.

Related papers

The Misclassification Likelihood Matrix: Some Classes Are More Likely To Be Misclassified Than Others [1.654278807602897]
This study introduces Misclassification Likelihood Matrix (MLM) as a novel tool for quantifying the reliability of neural network predictions under distribution shifts. The implications of this work extend beyond image classification, with ongoing applications in autonomous systems, such as self-driving cars.
arXiv Detail & Related papers (2024-07-10T16:43:14Z)
Marginal Debiased Network for Fair Visual Recognition [59.05212866862219]
We propose a novel marginal debiased network (MDN) to learn debiased representations. Our MDN can achieve a remarkable performance on under-represented samples.
arXiv Detail & Related papers (2024-01-04T08:57:09Z)
Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process. We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z)
Neural Network with Local Converging Input (NNLCI) for Supersonic Flow Problems with Unstructured Grids [0.9152133607343995]
We develop a neural network with local converging input (NNLCI) for high-fidelity prediction using unstructured data. As a validation case, the NNLCI method is applied to study inviscid supersonic flows in channels with bumps.
arXiv Detail & Related papers (2023-10-23T19:03:37Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
SEMI-CenterNet: A Machine Learning Facilitated Approach for Semiconductor Defect Inspection [0.10555513406636088]
We have proposed SEMI-CenterNet (SEMI-CN), a customized CN architecture trained on SEM images of semiconductor wafer defects. SEMI-CN gets trained to output the center, class, size, and offset of a defect instance. We train SEMI-CN on two datasets and benchmark two ResNet backbones for the framework.
arXiv Detail & Related papers (2023-08-14T14:39:06Z)
Unmatched uncertainty mitigation through neural network supported model predictive control [7.036452261968766]
We utilize a deep neural network (DNN) as an oracle in the underlying optimization problem of learning based MPC (LBMPC) We employ a dual-timescale adaptation mechanism, where the weights of the last layer of the neural network are updated in real time. Results indicate that the proposed approach is implementable in real time and carries the theoretical guarantees of LBMPC.
arXiv Detail & Related papers (2023-04-22T04:49:48Z)
On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI) Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation. We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z)
State and Topology Estimation for Unobservable Distribution Systems using Deep Neural Networks [8.673621107750652]
Time-synchronized state estimation for reconfigurable distribution networks is challenging because of limited real-time observability. This paper formulates a deep learning (DL)-based approach for topology identification (TI) and unbalanced three-phase distribution system state estimation (DSSE) Two deep neural networks (DNNs) are trained to operate in a sequential manner for implementing TI and DSSE for systems that are incompletely observed by synchrophasor measurement devices (SMDs)
arXiv Detail & Related papers (2021-04-15T02:46:50Z)
DS-UI: Dual-Supervised Mixture of Gaussian Mixture Models for Uncertainty Inference [52.899219617256655]
We propose a dual-supervised uncertainty inference (DS-UI) framework for improving Bayesian estimation-based uncertainty inference (UI) in deep neural network (DNN)-based image recognition. In the DS-UI, we combine the last fully-connected (FC) layer with a mixture of Gaussian mixture models (MoGMM) to obtain an MoGMM-FC layer. Experimental results show the DS-UI outperforms the state-of-the-art UI methods in misclassification detection.
arXiv Detail & Related papers (2020-11-17T12:35:02Z)
Collaborative Boundary-aware Context Encoding Networks for Error Map Prediction [65.44752447868626]
We propose collaborative boundaryaware context encoding networks called AEP-Net for error prediction task. Specifically, we propose a collaborative feature transformation branch for better feature fusion between images and masks, and precise localization of error regions. The AEP-Net achieves an average DSC of 0.8358, 0.8164 for error prediction task, and shows a high Pearson correlation coefficient of 0.9873.
arXiv Detail & Related papers (2020-06-25T12:42:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.