Related papers: Normalized Conditional Mutual Information Surrogate Loss for Deep Neural Classifiers

Normalized Conditional Mutual Information Surrogate Loss for Deep Neural Classifiers

URL: http://arxiv.org/abs/2601.02543v2
Date: Thu, 08 Jan 2026 19:36:47 GMT
Title: Normalized Conditional Mutual Information Surrogate Loss for Deep Neural Classifiers
Authors: Linfeng Ye, Zhixiang Chi, Konstantinos N. Plataniotis, En-hui Yang,
Abstract summary: We propose a novel information theoretic surrogate loss; normalized conditional mutual information (NCMI) as a drop in alternative to the de facto cross-entropy (CE)<n>Across image recognition and whole-slide imaging (WSI) subtyping benchmarks, NCMI-trained models surpass state of the art losses by substantial margins at a computational cost comparable to that of CE.
Score: 38.969159614398045
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose a novel information theoretic surrogate loss; normalized conditional mutual information (NCMI); as a drop in alternative to the de facto cross-entropy (CE) for training deep neural network (DNN) based classifiers. We first observe that the model's NCMI is inversely proportional to its accuracy. Building on this insight, we introduce an alternating algorithm to efficiently minimize the NCMI. Across image recognition and whole-slide imaging (WSI) subtyping benchmarks, NCMI-trained models surpass state of the art losses by substantial margins at a computational cost comparable to that of CE. Notably, on ImageNet, NCMI yields a 2.77% top-1 accuracy improvement with ResNet-50 comparing to the CE; on CAMELYON-17, replacing CE with NCMI improves the macro-F1 by 8.6% over the strongest baseline. Gains are consistent across various architectures and batch sizes, suggesting that NCMI is a practical and competitive alternative to CE.

Related papers

TFOC-Net: A Short-time Fourier Transform-based Deep Learning Approach for Enhancing Cross-Subject Motor Imagery Classification [0.47498241053872914]
Cross-subject motor imagery (CS-MI) classification in brain-computers (BCIs) is a challenging task due to the significant variability in Electroencephalography (EEG) patterns across different individuals.<n>This variability often results in lower classification accuracy compared to subject-specific models.<n>We introduce a novel approach that significantly enhances cross-subject MI classification performance through optimized preprocessing and deep learning techniques.
arXiv Detail & Related papers (2025-07-03T10:17:39Z)
CNN-Transformer Rectified Collaborative Learning for Medical Image Segmentation [60.08541107831459]
This paper proposes a CNN-Transformer rectified collaborative learning framework to learn stronger CNN-based and Transformer-based models for medical image segmentation. Specifically, we propose a rectified logit-wise collaborative learning (RLCL) strategy which introduces the ground truth to adaptively select and rectify the wrong regions in student soft labels. We also propose a class-aware feature-wise collaborative learning (CFCL) strategy to achieve effective knowledge transfer between CNN-based and Transformer-based models in the feature space.
arXiv Detail & Related papers (2024-08-25T01:27:35Z)
Conditional Mutual Information Constrained Deep Learning for Classification [3.5237980787861964]
conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and performance of a classification deep neural network (DNN) By using NCMI to evaluate popular DNNs pretrained over ImageNet in the literature, it is shown that their validation accuracies over ImageNet validation data set are more or less inversely proportional to their NCMI values. A novel alternating learning algorithm is proposed to solve such a constrained optimization problem.
arXiv Detail & Related papers (2023-09-17T01:16:45Z)
SEMI-CenterNet: A Machine Learning Facilitated Approach for Semiconductor Defect Inspection [0.10555513406636088]
We have proposed SEMI-CenterNet (SEMI-CN), a customized CN architecture trained on SEM images of semiconductor wafer defects. SEMI-CN gets trained to output the center, class, size, and offset of a defect instance. We train SEMI-CN on two datasets and benchmark two ResNet backbones for the framework.
arXiv Detail & Related papers (2023-08-14T14:39:06Z)
Magic ELF: Image Deraining Meets Association Learning and Transformer [63.761812092934576]
This paper aims to unify CNN and Transformer to take advantage of their learning merits for image deraining. A novel multi-input attention module (MAM) is proposed to associate rain removal and background recovery. Our proposed method (dubbed as ELF) outperforms the state-of-the-art approach (MPRNet) by 0.25 dB on average.
arXiv Detail & Related papers (2022-07-21T12:50:54Z)
Priming Cross-Session Motor Imagery Classification with A Universal Deep Domain Adaptation Framework [3.6824205556465834]
Motor imagery (MI) is a common brain computer interface (BCI) paradigm. We propose a Siamese deep domain adaptation (SDDA) framework for cross-session MI classification based on mathematical models in domain adaptation theory. The proposed framework can be easily applied to most existing artificial neural networks without altering the network structure.
arXiv Detail & Related papers (2022-02-19T09:30:08Z)
Greedy Network Enlarging [53.319011626986004]
We propose a greedy network enlarging method based on the reallocation of computations. With step-by-step modifying the computations on different stages, the enlarged network will be equipped with optimal allocation and utilization of MACs. With application of our method on GhostNet, we achieve state-of-the-art 80.9% and 84.3% ImageNet top-1 accuracies.
arXiv Detail & Related papers (2021-07-31T08:36:30Z)
EEG-Inception: An Accurate and Robust End-to-End Neural Network for EEG-based Motor Imagery Classification [123.93460670568554]
This paper proposes a novel convolutional neural network (CNN) architecture for accurate and robust EEG-based motor imagery (MI) classification. The proposed CNN model, namely EEG-Inception, is built on the backbone of the Inception-Time network. The proposed network is an end-to-end classification, as it takes the raw EEG signals as the input and does not require complex EEG signal-preprocessing.
arXiv Detail & Related papers (2021-01-24T19:03:10Z)
Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline. We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures. Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)
An Accurate EEGNet-based Motor-Imagery Brain-Computer Interface for Low-Power Edge Computing [13.266626571886354]
This paper presents an accurate and robust embedded motor-imagery brain-computer interface (MI-BCI) The proposed novel model, based on EEGNet, matches the requirements of memory footprint and computational resources of low-power microcontroller units (MCUs) The scaled models are deployed on a commercial Cortex-M4F MCU taking 101ms and consuming 4.28mJ per inference for operating the smallest model, and on a Cortex-M7 with 44ms and 18.1mJ per inference for the medium-sized model.
arXiv Detail & Related papers (2020-03-31T19:52:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.