Related papers: Know What You Don't Know: Selective Prediction for Early Exit DNNs

Know What You Don't Know: Selective Prediction for Early Exit DNNs

URL: http://arxiv.org/abs/2509.11520v1
Date: Mon, 15 Sep 2025 02:19:09 GMT
Title: Know What You Don't Know: Selective Prediction for Early Exit DNNs
Authors: Divya Jyoti Bajpai, Manjesh Kumar Hanawal,
Abstract summary: Inference latency and trustworthiness of Deep Neural Networks (DNNs) are the bottlenecks in deploying them in critical applications like sensitive tasks.<n>Early Exit (EE) DNNs overcome the latency issues by allowing samples to exit from intermediary layers if they attain high' confidence scores on the predicted class.<n>We use Selective Prediction (SP) to overcome this issue by checking the hardness' of untrustworthy samples.
Score: 14.00844847268286
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inference latency and trustworthiness of Deep Neural Networks (DNNs) are the bottlenecks in deploying them in critical applications like sensitive tasks. Early Exit (EE) DNNs overcome the latency issues by allowing samples to exit from intermediary layers if they attain `high' confidence scores on the predicted class. However, the DNNs are known to exhibit overconfidence, which can lead to many samples exiting early and render EE strategies untrustworthy. We use Selective Prediction (SP) to overcome this issue by checking the `hardness' of the samples rather than just relying on the confidence score alone. We propose SPEED, a novel approach that uses Deferral Classifiers (DCs) at each layer to check the hardness of samples before performing EEs. Specifically, the DCs identify if a sample is hard to predict at an intermediary layer, leading to hallucination, and defer it to an expert. Early detection of hard samples for inference prevents the wastage of computational resources and improves trust by deferring the hard samples to the expert. We demonstrate that EE aided with SP improves both accuracy and latency. Our method minimizes the risk of wrong prediction by $50\%$ with a speedup of $2.05\times$ as compared to the final layer. The anonymized source code is available at https://github.com/Div290/SPEED

Related papers

Agentic Test-Time Scaling for WebAgents [65.5178428849495]
We present Confidence-Aware Test-Time Scaling (CATTS), which uses vote-derived uncertainty to allocate compute only when decisions are genuinely contentious.<n>CATTS improves performance on WebArena-Lite and GoBrowse by up to 9.1% over React while using up to 2.3x fewer tokens than uniform scaling.
arXiv Detail & Related papers (2026-02-12T18:58:30Z)
Sample Smart, Not Hard: Correctness-First Decoding for Better Reasoning in LLMs [72.82403830490084]
We argue that the decoding rule should be calibrated by correctness, not confidence alone.<n>We propose simple strategies that achieve this goal: Greedy-Threshold makes sampling greedy at very low confidence steps.<n>Together, our findings challenge prevailings about decoding under uncertainty and show gains across math and general reasoning benchmarks.
arXiv Detail & Related papers (2025-10-07T14:46:12Z)
Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability [14.00844847268286]
Early-Exit Deep Neural Networks enable adaptive inference by allowing prediction at intermediary layers.<n>Our framework demonstrates consistent improvements in speedup (1.70-2.10x) with a minimal performance drop (2%) as compared to full model performance.
arXiv Detail & Related papers (2025-09-28T06:05:24Z)
Cautious Next Token Prediction [62.74127603725369]
We propose a new training-free decoding strategy, dubbed as Cautious Next Token Prediction (CNTP)<n>In the decoding process, if the model has comparatively high prediction entropy at a certain step, we sample multiple trials starting from the step independently and stop when encountering any punctuation.<n>We show that our proposed CNTP approach outperforms existing standard decoding strategies consistently by a clear margin.
arXiv Detail & Related papers (2025-07-03T05:49:18Z)
Improving Prediction Certainty Estimation for Reliable Early Exiting via Null Space Projection [16.838728310658105]
We propose a novel early exiting method based on the Certainty-Aware Probability (CAP) score.<n>We show that our method can achieve an average speed-up ratio of 2.19x across all tasks with negligible performance degradation.
arXiv Detail & Related papers (2025-06-08T05:08:34Z)
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts [5.402030962296633]
Early Exit techniques have emerged as a means to reduce inference latency in Deep Neural Networks (DNNs)<n>We propose a new decision criterion where exit classifiers are treated as experts BEEM and aggregate their confidence scores.<n>We show that our method enhances the performance of state-of-the-art EE methods, achieving improvements in speed-up by a factor 1.5x to 2.1x.
arXiv Detail & Related papers (2025-02-02T10:35:19Z)
Favour: FAst Variance Operator for Uncertainty Rating [0.034530027457862]
Bayesian Neural Networks (BNN) have emerged as a crucial approach for interpreting ML predictions. By sampling from the posterior distribution, data scientists may estimate the uncertainty of an inference. Previous work proposed propagating the first and second moments of the posterior directly through the network. This method is even slower than sampling, so the propagated variance needs to be approximated. Our contribution is a more principled variance propagation framework.
arXiv Detail & Related papers (2023-11-21T22:53:20Z)
Knowing When to Stop: Delay-Adaptive Spiking Neural Network Classifiers with Reliability Guarantees [36.14499894307206]
Spiking neural networks (SNNs) process time-series data via internal event-driven neural dynamics. We introduce a novel delay-adaptive SNN-based inference methodology that provides guaranteed reliability for the decisions produced at input-dependent stopping times.
arXiv Detail & Related papers (2023-05-18T22:11:04Z)
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z)
ZigZag: Universal Sampling-free Uncertainty Estimation Through Two-Step Inference [54.17205151960878]
We introduce a sampling-free approach that is generic and easy to deploy. We produce reliable uncertainty estimates on par with state-of-the-art methods at a significantly lower computational cost.
arXiv Detail & Related papers (2022-11-21T13:23:09Z)
Iterative Pseudo-Labeling with Deep Feature Annotation and Confidence-Based Sampling [127.46527972920383]
Training deep neural networks is challenging when large and annotated datasets are unavailable. We improve a recent iterative pseudo-labeling technique, Deep Feature, by selecting the most confident unsupervised samples to iteratively train a deep neural network. We first ascertain the best configuration for the baseline -- a self-trained deep neural network -- and then evaluate our confidence DeepFA for different confidence thresholds.
arXiv Detail & Related papers (2021-09-06T20:02:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.