Related papers: Confidence-gated training for efficient early-exit neural networks

Confidence-gated training for efficient early-exit neural networks

URL: http://arxiv.org/abs/2509.17885v1
Date: Mon, 22 Sep 2025 15:18:21 GMT
Title: Confidence-gated training for efficient early-exit neural networks
Authors: Saad Mokssit, Ouassim Karrakchou, Alejandro Mousist, Mounir Ghogho,
Abstract summary: Early-exit neural networks reduce inference cost by enabling confident predictions at intermediate layers.<n>We propose Confidence-Gated Training (CGT), a paradigm that conditionally propagates gradients from deeper exits only when preceding exits fail.
Score: 49.78598138251519
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Early-exit neural networks reduce inference cost by enabling confident predictions at intermediate layers. However, joint training often leads to gradient interference, with deeper classifiers dominating optimization. We propose Confidence-Gated Training (CGT), a paradigm that conditionally propagates gradients from deeper exits only when preceding exits fail. This encourages shallow classifiers to act as primary decision points while reserving deeper layers for harder inputs. By aligning training with the inference-time policy, CGT mitigates overthinking, improves early-exit accuracy, and preserves efficiency. Experiments on the Indian Pines and Fashion-MNIST benchmarks show that CGT lowers average inference cost while improving overall accuracy, offering a practical solution for deploying deep models in resource-constrained environments.

Related papers

SQUAD: Scalable Quorum Adaptive Decisions via ensemble of early exit neural networks [8.530214413698966]
We introduce SQUAD, the first inference scheme that integrates early-exit mechanisms with distributed ensemble learning.<n>We also introduce QUEST, a Neural Architecture Search method to select early-exit learners with optimized hierarchical diversity.<n>This consensus-driven approach yields statistically robust early exits, improving the test accuracy up to 5.95% compared to state-of-the-art dynamic solutions.
arXiv Detail & Related papers (2026-01-30T08:32:33Z)
Boosted Training of Lightweight Early Exits for Optimizing CNN Image Classification Inference [47.027290803102666]
We introduce a sequential training approach that aligns branch training with inference-time data distributions.<n>Experiments on the CINIC-10 dataset with a ResNet18 backbone demonstrate that BTS-EE consistently outperforms non-boosted training.<n>These results offer practical efficiency gains for applications such as industrial inspection, embedded vision, and UAV-based monitoring.
arXiv Detail & Related papers (2025-09-10T06:47:49Z)
A Survey of Early Exit Deep Neural Networks in NLP [5.402030962296633]
Deep Neural Networks (DNNs) have grown increasingly large in size to achieve state of the art performance across a wide range of tasks.<n>High computational requirements make them less suitable for resource-constrained applications.<n>Early exit strategies offer a promising solution by enabling adaptive inference.
arXiv Detail & Related papers (2025-01-13T20:08:52Z)
Domain Generalization Guided by Gradient Signal to Noise Ratio of Parameters [69.24377241408851]
Overfitting to the source domain is a common issue in gradient-based training of deep neural networks. We propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network's parameters.
arXiv Detail & Related papers (2023-10-11T10:21:34Z)
Pre-Pruning and Gradient-Dropping Improve Differentially Private Image Classification [9.120531252536617]
We introduce a new training paradigm that uses textitpre-pruning and textitgradient-dropping to reduce the parameter space and improve scalability. Our training paradigm introduces a tension between the rates of pre-pruning and gradient-dropping, privacy loss, and classification accuracy.
arXiv Detail & Related papers (2023-06-19T14:35:28Z)
Back to Basics: Efficient Network Compression via IMP [22.586474627159287]
Iterative Magnitude Pruning (IMP) is one of the most established approaches for network pruning. IMP is often argued that it reaches suboptimal states by not incorporating sparsification into the training phase. We find that IMP with SLR for retraining can outperform state-of-the-art pruning-during-training approaches.
arXiv Detail & Related papers (2021-11-01T11:23:44Z)
A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy. Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z)
Two-phase Pseudo Label Densification for Self-training based Domain Adaptation [93.03265290594278]
We propose a novel Two-phase Pseudo Label Densification framework, referred to as TPLD. In the first phase, we use sliding window voting to propagate the confident predictions, utilizing intrinsic spatial-correlations in the images. In the second phase, we perform a confidence-based easy-hard classification. To ease the training process and avoid noisy predictions, we introduce the bootstrapping mechanism to the original self-training loss.
arXiv Detail & Related papers (2020-12-09T02:35:25Z)
BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)
TRP: Trained Rank Pruning for Efficient Deep Neural Networks [69.06699632822514]
We propose Trained Rank Pruning (TRP), which alternates between low rank approximation and training. A nuclear regularization optimized by sub-gradient descent is utilized to further promote low rank in TRP. The TRP trained network inherently has a low-rank structure, and is approximated with negligible performance loss.
arXiv Detail & Related papers (2020-04-30T03:37:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.