Related papers: SQUAD: Scalable Quorum Adaptive Decisions via ensemble of early exit neural networks

SQUAD: Scalable Quorum Adaptive Decisions via ensemble of early exit neural networks

URL: http://arxiv.org/abs/2601.22711v1
Date: Fri, 30 Jan 2026 08:32:33 GMT
Title: SQUAD: Scalable Quorum Adaptive Decisions via ensemble of early exit neural networks
Authors: Matteo Gambella, Fabrizio Pittorino, Giuliano Casale, Manuel Roveri,
Abstract summary: We introduce SQUAD, the first inference scheme that integrates early-exit mechanisms with distributed ensemble learning.<n>We also introduce QUEST, a Neural Architecture Search method to select early-exit learners with optimized hierarchical diversity.<n>This consensus-driven approach yields statistically robust early exits, improving the test accuracy up to 5.95% compared to state-of-the-art dynamic solutions.
Score: 8.530214413698966
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Early-exit neural networks have become popular for reducing inference latency by allowing intermediate predictions when sufficient confidence is achieved. However, standard approaches typically rely on single-model confidence thresholds, which are frequently unreliable due to inherent calibration issues. To address this, we introduce SQUAD (Scalable Quorum Adaptive Decisions), the first inference scheme that integrates early-exit mechanisms with distributed ensemble learning, improving uncertainty estimation while reducing the inference time. Unlike traditional methods that depend on individual confidence scores, SQUAD employs a quorum-based stopping criterion on early-exit learners by collecting intermediate predictions incrementally in order of computational complexity until a consensus is reached and halting the computation at that exit if the consensus is statistically significant. To maximize the efficacy of this voting mechanism, we also introduce QUEST (Quorum Search Technique), a Neural Architecture Search method to select early-exit learners with optimized hierarchical diversity, ensuring learners are complementary at every intermediate layer. This consensus-driven approach yields statistically robust early exits, improving the test accuracy up to 5.95% compared to state-of-the-art dynamic solutions with a comparable computational cost and reducing the inference latency up to 70.60% compared to static ensembles while maintaining a good accuracy.

Related papers

Towards Anytime-Valid Statistical Watermarking [63.02116925616554]
We develop the first e-value-based watermarking framework, Anchored E-Watermarking, that unifies optimal sampling with anytime-valid inference.<n>Our framework can significantly enhance sample efficiency, reducing the average token budget required for detection by 13-15% relative to state-of-the-art baselines.
arXiv Detail & Related papers (2026-02-19T18:32:26Z)
Confidence-gated training for efficient early-exit neural networks [49.78598138251519]
Early-exit neural networks reduce inference cost by enabling confident predictions at intermediate layers.<n>We propose Confidence-Gated Training (CGT), a paradigm that conditionally propagates gradients from deeper exits only when preceding exits fail.
arXiv Detail & Related papers (2025-09-22T15:18:21Z)
Non-Asymptotic Uncertainty Quantification in High-Dimensional Learning [5.318766629972959]
Uncertainty quantification is a crucial but challenging task in many high-dimensional regression or learning problems. We develop a new data-driven approach for UQ in regression that applies both to classical regression approaches as well as to neural networks.
arXiv Detail & Related papers (2024-07-18T16:42:10Z)
Towards Calibrated Deep Clustering Network [60.71776081164377]
In deep clustering, the estimated confidence for a sample belonging to a particular cluster greatly exceeds its actual prediction accuracy.<n>We propose a novel dual head (calibration head and clustering head) deep clustering model that can effectively calibrate the estimated confidence and the actual accuracy.<n>The proposed calibrated deep clustering model not only surpasses the state-of-the-art deep clustering methods by 5x on average in terms of expected calibration error, but also significantly outperforms them in terms of clustering accuracy.
arXiv Detail & Related papers (2024-03-04T11:23:40Z)
Early stopping by correlating online indicators in neural networks [0.24578723416255746]
We propose a novel technique to identify overfitting phenomena when training the learner. Our proposal exploits the correlation over time in a collection of online indicators. As opposed to previous approaches focused on a single criterion, we take advantage of subsidiarities between independent assessments.
arXiv Detail & Related papers (2024-02-04T14:57:20Z)
Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores. We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z)
BayesNetCNN: incorporating uncertainty in neural networks for image-based classification tasks [0.29005223064604074]
We propose a method to convert a standard neural network into a Bayesian neural network. We estimate the variability of predictions by sampling different networks similar to the original one at each forward pass. We test our model in a large cohort of brain images from Alzheimer's Disease patients.
arXiv Detail & Related papers (2022-09-27T01:07:19Z)
A novel Deep Learning approach for one-step Conformal Prediction approximation [0.7646713951724009]
Conformal Prediction (CP) is a versatile solution that guarantees a maximum error rate given minimal constraints. We propose a novel conformal loss function that approximates the traditionally two-step CP approach in a single step.
arXiv Detail & Related papers (2022-07-25T17:46:09Z)
Resource-Constrained Edge AI with Early Exit Prediction [5.060405696893342]
We propose an early exit prediction mechanism to reduce the on-device computation overhead in a device-edge co-inference system. Specifically, we design a low-complexity module, namely the Exit Predictor, to guide some distinctly "hard" samples to bypass the computation of the early exits. Considering the varying communication bandwidth, we extend the early exit prediction mechanism for latency-aware edge inference.
arXiv Detail & Related papers (2022-06-15T03:14:21Z)
Quantifying Uncertainty in Deep Spatiotemporal Forecasting [67.77102283276409]
We describe two types of forecasting problems: regular grid-based and graph-based. We analyze UQ methods from both the Bayesian and the frequentist point view, casting in a unified framework via statistical decision theory. Through extensive experiments on real-world road network traffic, epidemics, and air quality forecasting tasks, we reveal the statistical computational trade-offs for different UQ methods.
arXiv Detail & Related papers (2021-05-25T14:35:46Z)
CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning. We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z)
BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.