Related papers: A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation

A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation

URL: http://arxiv.org/abs/2203.01670v1
Date: Thu, 3 Mar 2022 12:02:05 GMT
Title: A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation
Authors: Tianxiang Sun, Xiangyang Liu, Wei Zhu, Zhichao Geng, Lingling Wu, Yilong He, Yuan Ni, Guotong Xie, Xuanjing Huang, Xipeng Qiu
Abstract summary: Early exiting allows instances to exit at different layers according to the estimation of difficulty. We propose a Hash-based Early Exiting approach (HashEE) that replaces the learn-to-exit modules with hash functions to assign each token to a fixed exiting layer. Experimental results on classification, regression, and generation tasks demonstrate that HashEE can achieve higher performance with fewer FLOPs and inference time.
Score: 77.85086491395981
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Early exiting allows instances to exit at different layers according to the estimation of difficulty. Previous works usually adopt heuristic metrics such as the entropy of internal outputs to measure instance difficulty, which suffers from generalization and threshold-tuning. In contrast, learning to exit, or learning to predict instance difficulty is a more appealing way. Though some effort has been devoted to employing such "learn-to-exit" modules, it is still unknown whether and how well the instance difficulty can be learned. As a response, we first conduct experiments on the learnability of instance difficulty, which demonstrates that modern neural models perform poorly on predicting instance difficulty. Based on this observation, we propose a simple-yet-effective Hash-based Early Exiting approach (HashEE) that replaces the learn-to-exit modules with hash functions to assign each token to a fixed exiting layer. Different from previous methods, HashEE requires no internal classifiers nor extra parameters, and therefore is more efficient. Experimental results on classification, regression, and generation tasks demonstrate that HashEE can achieve higher performance with fewer FLOPs and inference time compared with previous state-of-the-art early exiting methods.

Related papers

A Neuro-inspired Interpretation of Unlearning in Large Language Models through Sample-level Unlearning Difficulty [12.382999548648726]
Existing studies assume a uniform unlearning difficulty across samples. We propose a Memory Removal Difficulty ($mathrmMRD$) metric to quantify sample-level unlearning difficulty. We also propose an $mathrmMRD$-based weighted sampling method to optimize existing unlearning algorithms.
arXiv Detail & Related papers (2025-04-09T07:48:10Z)
Machine Unlearning in Forgettability Sequence [22.497699136603877]
We identify key factor affecting unlearning difficulty and the performance of unlearning algorithms. We propose a general unlearning framework, dubbed RSU, which consists of Ranking module and SeqUnlearn module.
arXiv Detail & Related papers (2024-10-09T01:12:07Z)
Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks. To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks. However, it is not expected in practice considering the memory constraint or data privacy issue. As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z)
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference [21.24566458648584]
We propose ConsistentEE, an early exiting method consistent in training and inference. A policy network is added to decide whether an instance should exit or continue. We incorporate memorized layer into reward function design, which allows "easy" instances to focus more on acceleration.
arXiv Detail & Related papers (2023-12-19T06:16:13Z)
Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process. We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified. Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z)
Difficulty-Net: Learning to Predict Difficulty for Long-Tailed Recognition [5.977483447975081]
We propose Difficulty-Net, which learns to predict the difficulty of classes using the model's performance in a meta-learning framework. We introduce two key concepts, namely the relative difficulty and the driver loss. Experiments on popular long-tailed datasets demonstrated the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-09-07T07:04:08Z)
Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora. It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons. We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z)
Hard Example Guided Hashing for Image Retrieval [3.606866431185676]
It exists two main factors affecting the ability of learning hard examples, which are weak key features extraction and the shortage of hard examples. In this paper, we give a novel end-to-end model to extract the key feature from hard examples and obtain hash code with the accurate semantic information. Experimental results on CIFAR-10 and NUS-WIDE demonstrate that our model outperformances the mainstream hashing-based image retrieval methods.
arXiv Detail & Related papers (2021-12-27T08:24:10Z)
One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective [86.48094395282546]
A deep hashing model typically has two main learning objectives: to make the learned binary hash codes discriminative and to minimize a quantization error. We propose a novel deep hashing model with only a single learning objective. Our model is highly effective, outperforming the state-of-the-art multi-loss hashing models on three large-scale instance retrieval benchmarks.
arXiv Detail & Related papers (2021-09-29T14:27:51Z)
Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination [82.52105963476703]
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise. First-order guarantees are relatively well understood in statistical and online learning. We show that the logarithmic loss and an information-theoretic quantity called the triangular discrimination play a fundamental role in obtaining first-order guarantees.
arXiv Detail & Related papers (2021-07-05T19:20:34Z)
Contrastive Learning with Hard Negative Samples [80.12117639845678]
We develop a new family of unsupervised sampling methods for selecting hard negative samples. A limiting case of this sampling results in a representation that tightly clusters each class, and pushes different classes as far apart as possible. The proposed method improves downstream performance across multiple modalities, requires only few additional lines of code to implement, and introduces no computational overhead.
arXiv Detail & Related papers (2020-10-09T14:18:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.