Related papers: Improving Prediction Certainty Estimation for Reliable Early Exiting via Null Space Projection

Improving Prediction Certainty Estimation for Reliable Early Exiting via Null Space Projection

URL: http://arxiv.org/abs/2506.17249v1
Date: Sun, 08 Jun 2025 05:08:34 GMT
Title: Improving Prediction Certainty Estimation for Reliable Early Exiting via Null Space Projection
Authors: Jianing He, Qi Zhang, Duoqian Miao, Yi Kun, Shufeng Hao, Hongyun Zhang, Zhihua Wei,
Abstract summary: We propose a novel early exiting method based on the Certainty-Aware Probability (CAP) score.<n>We show that our method can achieve an average speed-up ratio of 2.19x across all tasks with negligible performance degradation.
Score: 16.838728310658105
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Early exiting has demonstrated great potential in accelerating the inference of pre-trained language models (PLMs) by enabling easy samples to exit at shallow layers, eliminating the need for executing deeper layers. However, existing early exiting methods primarily rely on class-relevant logits to formulate their exiting signals for estimating prediction certainty, neglecting the detrimental influence of class-irrelevant information in the features on prediction certainty. This leads to an overestimation of prediction certainty, causing premature exiting of samples with incorrect early predictions. To remedy this, we define an NSP score to estimate prediction certainty by considering the proportion of class-irrelevant information in the features. On this basis, we propose a novel early exiting method based on the Certainty-Aware Probability (CAP) score, which integrates insights from both logits and the NSP score to enhance prediction certainty estimation, thus enabling more reliable exiting decisions. The experimental results on the GLUE benchmark show that our method can achieve an average speed-up ratio of 2.19x across all tasks with negligible performance degradation, surpassing the state-of-the-art (SOTA) ConsistentEE by 28%, yielding a better trade-off between task performance and inference efficiency. The code is available at https://github.com/He-Jianing/NSP.git.

Related papers

Loss Shaping Constraints for Long-Term Time Series Forecasting [79.3533114027664]
We present a Constrained Learning approach for long-term time series forecasting that respects a user-defined upper bound on the loss at each time-step. We propose a practical Primal-Dual algorithm to tackle it, and aims to demonstrate that it exhibits competitive average performance in time series benchmarks, while shaping the errors across the predicted window.
arXiv Detail & Related papers (2024-02-14T18:20:44Z)
DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks [43.967626080432275]
We propose a novel Distance-Enhanced Early Exiting framework for BERT (DE$3$-BERT) We implement a hybrid exiting strategy that supplements classic entropy-based local information with distance-based global information. Experiments on the GLUE benchmark demonstrate that DE$3$-BERT consistently outperforms state-of-the-art models.
arXiv Detail & Related papers (2024-02-03T15:51:17Z)
SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process [76.98721879039559]
We propose SMURF-THP, a score-based method for learning Transformer Hawkes process and quantifying prediction uncertainty. Specifically, SMURF-THP learns the score function of events' arrival time based on a score-matching objective. We conduct extensive experiments in both event type prediction and uncertainty quantification of arrival time.
arXiv Detail & Related papers (2023-10-25T03:33:45Z)
Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process with Uncertainty Quantification [59.81904428056924]
We introduce SMASH: a Score MAtching estimator for learning markedPs with uncertainty quantification. Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of markedPs through score-matching. The superior performance of our proposed framework is demonstrated through extensive experiments in both event prediction and uncertainty quantification.
arXiv Detail & Related papers (2023-10-25T02:37:51Z)
LMD: Light-weight Prediction Quality Estimation for Object Detection in Lidar Point Clouds [3.927702899922668]
Object detection on Lidar point cloud data is a promising technology for autonomous driving and robotics. Uncertainty estimation is a crucial component for down-stream tasks and deep neural networks remain error-prone even for predictions with high confidence. We propose LidarMetaDetect, a light-weight post-processing scheme for prediction quality estimation. Our experiments show a significant increase of statistical reliability in separating true from false predictions.
arXiv Detail & Related papers (2023-06-13T15:13:29Z)
Variational Inference with Coverage Guarantees in Simulation-Based Inference [18.818573945984873]
We propose Conformalized Amortized Neural Variational Inference (CANVI) CANVI constructs conformalized predictors based on each candidate, compares the predictors using a metric known as predictive efficiency, and returns the most efficient predictor. We prove lower bounds on the predictive efficiency of the regions produced by CANVI and explore how the quality of a posterior approximation relates to the predictive efficiency of prediction regions based on that approximation.
arXiv Detail & Related papers (2023-05-23T17:24:04Z)
Uncertainty estimation of pedestrian future trajectory using Bayesian approximation [137.00426219455116]
Under dynamic traffic scenarios, planning based on deterministic predictions is not trustworthy. The authors propose to quantify uncertainty during forecasting using approximation which deterministic approaches fail to capture. The effect of dropout weights and long-term prediction on future state uncertainty has been studied.
arXiv Detail & Related papers (2022-05-04T04:23:38Z)
Taming Overconfident Prediction on Unlabeled Data from Hindsight [50.9088560433925]
Minimizing prediction uncertainty on unlabeled data is a key factor to achieve good performance in semi-supervised learning. This paper proposes a dual mechanism, named ADaptive Sharpening (ADS), which first applies a soft-threshold to adaptively mask out determinate and negligible predictions. ADS significantly improves the state-of-the-art SSL methods by making it a plug-in.
arXiv Detail & Related papers (2021-12-15T15:17:02Z)
Evaluation of Machine Learning Techniques for Forecast Uncertainty Quantification [0.13999481573773068]
Ensemble forecasting is, so far, the most successful approach to produce relevant forecasts along with an estimation of their uncertainty. Main limitations of ensemble forecasting are the high computational cost and the difficulty to capture and quantify different sources of uncertainty. In this work proof-of-concept model experiments are conducted to examine the performance of ANNs trained to predict a corrected state of the system and the state uncertainty using only a single deterministic forecast as input.
arXiv Detail & Related papers (2021-11-29T16:52:17Z)
Neural Predictive Monitoring under Partial Observability [4.1316328854247155]
We present a learning-based method for predictive monitoring (PM) that produces accurate and reliable reachability predictions despite partial observability (PO) Our method results in highly accurate reachability predictions and error detection, as well as tight prediction regions with guaranteed coverage.
arXiv Detail & Related papers (2021-08-16T15:08:20Z)
Towards More Fine-grained and Reliable NLP Performance Prediction [85.78131503006193]
We make two contributions to improving performance prediction for NLP tasks. First, we examine performance predictors for holistic measures of accuracy like F1 or BLEU. Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration.
arXiv Detail & Related papers (2021-02-10T15:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.