Conditional Mutual Information Constrained Deep Learning for
Classification
- URL: http://arxiv.org/abs/2309.09123v1
- Date: Sun, 17 Sep 2023 01:16:45 GMT
- Title: Conditional Mutual Information Constrained Deep Learning for
Classification
- Authors: En-Hui Yang, Shayan Mohajer Hamidi, Linfeng Ye, Renhao Tan and Beverly
Yang
- Abstract summary: conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and performance of a classification deep neural network (DNN)
By using NCMI to evaluate popular DNNs pretrained over ImageNet in the literature, it is shown that their validation accuracies over ImageNet validation data set are more or less inversely proportional to their NCMI values.
A novel alternating learning algorithm is proposed to solve such a constrained optimization problem.
- Score: 3.5237980787861964
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The concepts of conditional mutual information (CMI) and normalized
conditional mutual information (NCMI) are introduced to measure the
concentration and separation performance of a classification deep neural
network (DNN) in the output probability distribution space of the DNN, where
CMI and the ratio between CMI and NCMI represent the intra-class concentration
and inter-class separation of the DNN, respectively. By using NCMI to evaluate
popular DNNs pretrained over ImageNet in the literature, it is shown that their
validation accuracies over ImageNet validation data set are more or less
inversely proportional to their NCMI values. Based on this observation, the
standard deep learning (DL) framework is further modified to minimize the
standard cross entropy function subject to an NCMI constraint, yielding CMI
constrained deep learning (CMIC-DL). A novel alternating learning algorithm is
proposed to solve such a constrained optimization problem. Extensive experiment
results show that DNNs trained within CMIC-DL outperform the state-of-the-art
models trained within the standard DL and other loss functions in the
literature in terms of both accuracy and robustness against adversarial
attacks. In addition, visualizing the evolution of learning process through the
lens of CMI and NCMI is also advocated.
Related papers
- The Misclassification Likelihood Matrix: Some Classes Are More Likely To Be Misclassified Than Others [1.654278807602897]
This study introduces Misclassification Likelihood Matrix (MLM) as a novel tool for quantifying the reliability of neural network predictions under distribution shifts.
The implications of this work extend beyond image classification, with ongoing applications in autonomous systems, such as self-driving cars.
arXiv Detail & Related papers (2024-07-10T16:43:14Z) - Marginal Debiased Network for Fair Visual Recognition [59.05212866862219]
We propose a novel marginal debiased network (MDN) to learn debiased representations.
Our MDN can achieve a remarkable performance on under-represented samples.
arXiv Detail & Related papers (2024-01-04T08:57:09Z) - Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Neural Network with Local Converging Input (NNLCI) for Supersonic Flow
Problems with Unstructured Grids [0.9152133607343995]
We develop a neural network with local converging input (NNLCI) for high-fidelity prediction using unstructured data.
As a validation case, the NNLCI method is applied to study inviscid supersonic flows in channels with bumps.
arXiv Detail & Related papers (2023-10-23T19:03:37Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - SEMI-CenterNet: A Machine Learning Facilitated Approach for
Semiconductor Defect Inspection [0.10555513406636088]
We have proposed SEMI-CenterNet (SEMI-CN), a customized CN architecture trained on SEM images of semiconductor wafer defects.
SEMI-CN gets trained to output the center, class, size, and offset of a defect instance.
We train SEMI-CN on two datasets and benchmark two ResNet backbones for the framework.
arXiv Detail & Related papers (2023-08-14T14:39:06Z) - Unmatched uncertainty mitigation through neural network supported model
predictive control [7.036452261968766]
We utilize a deep neural network (DNN) as an oracle in the underlying optimization problem of learning based MPC (LBMPC)
We employ a dual-timescale adaptation mechanism, where the weights of the last layer of the neural network are updated in real time.
Results indicate that the proposed approach is implementable in real time and carries the theoretical guarantees of LBMPC.
arXiv Detail & Related papers (2023-04-22T04:49:48Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - State and Topology Estimation for Unobservable Distribution Systems
using Deep Neural Networks [8.673621107750652]
Time-synchronized state estimation for reconfigurable distribution networks is challenging because of limited real-time observability.
This paper formulates a deep learning (DL)-based approach for topology identification (TI) and unbalanced three-phase distribution system state estimation (DSSE)
Two deep neural networks (DNNs) are trained to operate in a sequential manner for implementing TI and DSSE for systems that are incompletely observed by synchrophasor measurement devices (SMDs)
arXiv Detail & Related papers (2021-04-15T02:46:50Z) - DS-UI: Dual-Supervised Mixture of Gaussian Mixture Models for
Uncertainty Inference [52.899219617256655]
We propose a dual-supervised uncertainty inference (DS-UI) framework for improving Bayesian estimation-based uncertainty inference (UI) in deep neural network (DNN)-based image recognition.
In the DS-UI, we combine the last fully-connected (FC) layer with a mixture of Gaussian mixture models (MoGMM) to obtain an MoGMM-FC layer.
Experimental results show the DS-UI outperforms the state-of-the-art UI methods in misclassification detection.
arXiv Detail & Related papers (2020-11-17T12:35:02Z) - Collaborative Boundary-aware Context Encoding Networks for Error Map
Prediction [65.44752447868626]
We propose collaborative boundaryaware context encoding networks called AEP-Net for error prediction task.
Specifically, we propose a collaborative feature transformation branch for better feature fusion between images and masks, and precise localization of error regions.
The AEP-Net achieves an average DSC of 0.8358, 0.8164 for error prediction task, and shows a high Pearson correlation coefficient of 0.9873.
arXiv Detail & Related papers (2020-06-25T12:42:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.