EBJR: Energy-Based Joint Reasoning for Adaptive Inference
- URL: http://arxiv.org/abs/2110.10343v1
- Date: Wed, 20 Oct 2021 02:33:31 GMT
- Title: EBJR: Energy-Based Joint Reasoning for Adaptive Inference
- Authors: Mohammad Akbari, Amin Banitalebi-Dehkordi, Yong Zhang
- Abstract summary: State-of-the-art deep learning models have achieved significant performance levels on various benchmarks.
Light-weight architectures, on the other hand, achieve moderate accuracies, but at a much more desirable latency.
This paper presents a new method of jointly using the large accurate models together with the small fast ones.
- Score: 10.447353952054492
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art deep learning models have achieved significant performance
levels on various benchmarks. However, the excellent performance comes at a
cost of inefficient computational cost. Light-weight architectures, on the
other hand, achieve moderate accuracies, but at a much more desirable latency.
This paper presents a new method of jointly using the large accurate models
together with the small fast ones. To this end, we propose an Energy-Based
Joint Reasoning (EBJR) framework that adaptively distributes the samples
between shallow and deep models to achieve an accuracy close to the deep model,
but latency close to the shallow one. Our method is applicable to
out-of-the-box pre-trained models as it does not require an architecture change
nor re-training. Moreover, it is easy to use and deploy, especially for cloud
services. Through a comprehensive set of experiments on different down-stream
tasks, we show that our method outperforms strong state-of-the-art approaches
with a considerable margin. In addition, we propose specialized EBJR, an
extension of our method where we create a smaller specialized side model that
performs the target task only partially, but yields an even higher accuracy and
faster inference. We verify the strengths of our methods with both theoretical
and experimental evaluations.
Related papers
- See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition [56.87609859444084]
parameter-efficient fine-tuning (PEFT) focuses on optimizing a select subset of parameters while keeping the rest fixed, significantly lowering computational and storage overheads.
We take the first step to unify all approaches by dissecting them from a decomposition perspective.
We introduce two novel PEFT methods alongside a simple yet effective framework designed to enhance the performance of PEFT techniques across various applications.
arXiv Detail & Related papers (2024-07-07T15:44:42Z) - A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts [49.394145046409044]
This paper provides the first provably efficient technique for pruning experts in finetuned MoE models.
We theoretically prove that prioritizing the pruning of the experts with a smaller change of the routers l2 norm from the pretrained model guarantees the preservation of test accuracy.
Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models.
arXiv Detail & Related papers (2024-05-26T17:52:58Z) - Distill-then-prune: An Efficient Compression Framework for Real-time Stereo Matching Network on Edge Devices [5.696239274365031]
We propose a novel strategy by incorporating knowledge distillation and model pruning to overcome the inherent trade-off between speed and accuracy.
We obtained a model that maintains real-time performance while delivering high accuracy on edge devices.
arXiv Detail & Related papers (2024-05-20T06:03:55Z) - Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives [17.10165955576643]
Current state-of-the-art empirical techniques offer sub-optimal performance on practical, non-decomposable performance objectives.
We propose SelMix, a selective mixup-based inexpensive fine-tuning technique for pre-trained models.
We find that proposed SelMix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks.
arXiv Detail & Related papers (2024-03-27T06:55:23Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - HCE: Improving Performance and Efficiency with Heterogeneously
Compressed Neural Network Ensemble [22.065904428696353]
Recent ensemble training method explores different training algorithms or settings on multiple sub-models with the same model architecture.
We propose Heterogeneously Compressed Ensemble (HCE), where we build an efficient ensemble with the pruned and quantized variants from a pretrained DNN model.
arXiv Detail & Related papers (2023-01-18T21:47:05Z) - Design Amortization for Bayesian Optimal Experimental Design [70.13948372218849]
We build off of successful variational approaches, which optimize a parameterized variational model with respect to bounds on the expected information gain (EIG)
We present a novel neural architecture that allows experimenters to optimize a single variational model that can estimate the EIG for potentially infinitely many designs.
arXiv Detail & Related papers (2022-10-07T02:12:34Z) - Follow Your Path: a Progressive Method for Knowledge Distillation [23.709919521355936]
We propose ProKT, a new model-agnostic method by projecting the supervision signals of a teacher model into the student's parameter space.
Experiments on both image and text datasets show that our proposed ProKT consistently achieves superior performance compared to other existing knowledge distillation methods.
arXiv Detail & Related papers (2021-07-20T07:44:33Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z) - Performance of Hyperbolic Geometry Models on Top-N Recommendation Tasks [72.62702932371148]
We introduce a simple autoencoder based on hyperbolic geometry for solving standard collaborative filtering problem.
In contrast to many modern deep learning techniques, we build our solution using only a single hidden layer.
arXiv Detail & Related papers (2020-08-15T13:21:10Z) - Speedy Performance Estimation for Neural Architecture Search [47.683124540824515]
We propose to estimate the final test performance based on a simple measure of training speed.
Our estimator is theoretically motivated by the connection between generalisation and training speed.
arXiv Detail & Related papers (2020-06-08T11:48:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.