COSEE: Consistency-Oriented Signal-Based Early Exiting via Calibrated Sample Weighting Mechanism
- URL: http://arxiv.org/abs/2412.13236v1
- Date: Tue, 17 Dec 2024 16:24:55 GMT
- Title: COSEE: Consistency-Oriented Signal-Based Early Exiting via Calibrated Sample Weighting Mechanism
- Authors: Jianing He, Qi Zhang, Hongyun Zhang, Xuanjing Huang, Usman Naseem, Duoqian Miao,
- Abstract summary: Early exiting is an effective paradigm for improving the inference efficiency of pre-trained language models (PLMs)<n>We propose a novel Consistency-Oriented Signal-based Early Exiting (COSEE) framework, which leverages a calibrated sample weighting mechanism.<n>Experiments on the GLUE benchmark demonstrate the effectiveness of our COSEE across multiple exiting signals and backbones, yielding a better trade-off between performance and efficiency.
- Score: 32.015402521706825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Early exiting is an effective paradigm for improving the inference efficiency of pre-trained language models (PLMs) by dynamically adjusting the number of executed layers for each sample. However, in most existing works, easy and hard samples are treated equally by each classifier during training, which neglects the test-time early exiting behavior, leading to inconsistency between training and testing. Although some methods have tackled this issue under a fixed speed-up ratio, the challenge of flexibly adjusting the speed-up ratio while maintaining consistency between training and testing is still under-explored. To bridge the gap, we propose a novel Consistency-Oriented Signal-based Early Exiting (COSEE) framework, which leverages a calibrated sample weighting mechanism to enable each classifier to emphasize the samples that are more likely to exit at that classifier under various acceleration scenarios. Extensive experiments on the GLUE benchmark demonstrate the effectiveness of our COSEE across multiple exiting signals and backbones, yielding a better trade-off between performance and efficiency.
Related papers
- Combating Noisy Labels through Fostering Self- and Neighbor-Consistency [120.4394402099635]
Label noise is pervasive in various real-world scenarios, posing challenges in supervised deep learning.<n>We propose a noise-robust method named Jo-SNC (textbfJoint sample selection and model regularization based on textbfSelf- and textbfNeighbor-textbfConsistency)<n>We design a self-adaptive, data-driven thresholding scheme to adjust per-class selection thresholds.
arXiv Detail & Related papers (2026-01-19T07:55:29Z) - KLASS: KL-Guided Fast Inference in Masked Diffusion Models [31.082562509845985]
Masked diffusion models have demonstrated competitive results on various tasks including language generation.<n>We introduce KL-Adaptive Stability Sampling' (KLASS), a fast yet effective sampling method that exploits token-level KL divergence to identify stable, high-confidence predictions.<n>On reasoning benchmarks, KLASS achieves up to $2.78times$ wall-clock speedups while improving performance over standard greedy decoding.
arXiv Detail & Related papers (2025-11-07T19:05:36Z) - Unleashing the Potential of All Test Samples: Mean-Shift Guided Test-Time Adaptation [18.82879703518279]
Existing training-free test-time adaptation methods operate strictly within CLIP's original feature space.<n>We propose MS-TTA, a training-free approach that enhances feature representations beyond CLIP's space using a single-step k-nearest neighbors (kNN) Mean-Shift.
arXiv Detail & Related papers (2025-07-01T06:22:00Z) - Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering [18.543769006014383]
Diffusion models often exhibit inconsistent sample quality due to variations inherent in their sampling trajectories.<n>We introduce CFG-Rejection, an efficient, plug-and-play strategy that filters low-quality samples at an early stage of the denoising process.<n>We validate the effectiveness of CFG-Rejection in image generation through extensive experiments.
arXiv Detail & Related papers (2025-05-29T11:08:24Z) - Enhancing Sample Selection by Cutting Mislabeled Easy Examples [62.13094877228772]
We show that mislabeled examples correctly predicted by the model early in the training process are particularly harmful to model performance.
We propose Early Cutting, which employs the model's later training state to re-select the confident subset identified early in training.
arXiv Detail & Related papers (2025-02-12T09:12:45Z) - Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation [3.8959351616076745]
Flow matching has emerged as a promising framework for training generative models.
We introduce a self-corrected flow distillation method that integrates consistency models and adversarial training.
This work is a pioneer in achieving consistent generation quality in both few-step and one-step sampling.
arXiv Detail & Related papers (2024-12-22T07:48:49Z) - Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders [101.42201747763178]
Unlearnable examples (UEs) seek to maximize testing error by making subtle modifications to training examples that are correctly labeled.
Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method.
arXiv Detail & Related papers (2024-05-02T16:49:25Z) - Enhancing Out-of-Distribution Detection with Multitesting-based Layer-wise Feature Fusion [11.689517005768046]
Out-of-distribution samples may exhibit shifts in local or global features compared to the training distribution.
We propose a novel framework, Multitesting-based Layer-wise Out-of-Distribution (OOD) Detection.
Our scheme effectively enhances the performance of out-of-distribution detection when compared to baseline methods.
arXiv Detail & Related papers (2024-03-16T04:35:04Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Generalized Robust Test-Time Adaptation in Continuous Dynamic Scenarios [18.527640606971563]
Test-time adaptation (TTA) adapts pre-trained models to test distributions during the inference phase exclusively employing unlabeled test data streams.
We propose a Generalized Robust Test-Time Adaptation (GRoTTA) method to effectively address the difficult problem.
arXiv Detail & Related papers (2023-10-07T07:13:49Z) - Learning to Weight Samples for Dynamic Early-exiting Networks [35.03752825893429]
Early exiting is an effective paradigm for improving the inference efficiency of deep networks.
Our work proposes to adopt a weight prediction network to weight the loss of different training samples at each exit.
We show that the proposed weighting mechanism consistently improves the trade-off between classification accuracy and inference efficiency.
arXiv Detail & Related papers (2022-09-17T10:46:32Z) - Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss.
Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z) - Mitigating Catastrophic Forgetting in Scheduled Sampling with Elastic
Weight Consolidation in Neural Machine Translation [15.581515781839656]
Autoregressive models trained with maximum likelihood estimation suffer from exposure bias.
We propose using Elastic Weight Consolidation as trade-off between mitigating exposure bias and retaining output quality.
Experiments on two IWSLT'14 translation tasks demonstrate that our approach alleviates catastrophic forgetting and significantly improves BLEU.
arXiv Detail & Related papers (2021-09-13T20:37:58Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model.
Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.