Related papers: EBJR: Energy-Based Joint Reasoning for Adaptive Inference

EBJR: Energy-Based Joint Reasoning for Adaptive Inference

URL: http://arxiv.org/abs/2110.10343v1
Date: Wed, 20 Oct 2021 02:33:31 GMT
Title: EBJR: Energy-Based Joint Reasoning for Adaptive Inference
Authors: Mohammad Akbari, Amin Banitalebi-Dehkordi, Yong Zhang
Abstract summary: State-of-the-art deep learning models have achieved significant performance levels on various benchmarks. Light-weight architectures, on the other hand, achieve moderate accuracies, but at a much more desirable latency. This paper presents a new method of jointly using the large accurate models together with the small fast ones.
Score: 10.447353952054492
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art deep learning models have achieved significant performance levels on various benchmarks. However, the excellent performance comes at a cost of inefficient computational cost. Light-weight architectures, on the other hand, achieve moderate accuracies, but at a much more desirable latency. This paper presents a new method of jointly using the large accurate models together with the small fast ones. To this end, we propose an Energy-Based Joint Reasoning (EBJR) framework that adaptively distributes the samples between shallow and deep models to achieve an accuracy close to the deep model, but latency close to the shallow one. Our method is applicable to out-of-the-box pre-trained models as it does not require an architecture change nor re-training. Moreover, it is easy to use and deploy, especially for cloud services. Through a comprehensive set of experiments on different down-stream tasks, we show that our method outperforms strong state-of-the-art approaches with a considerable margin. In addition, we propose specialized EBJR, an extension of our method where we create a smaller specialized side model that performs the target task only partially, but yields an even higher accuracy and faster inference. We verify the strengths of our methods with both theoretical and experimental evaluations.

Related papers

TFG: Unified Training-Free Guidance for Diffusion Models [82.14536097768632]
Training-free guidance can generate samples with desirable target properties without additional training. Existing methods, though effective in various individual applications, often lack theoretical grounding and rigorous testing on extensive benchmarks. This paper introduces a novel algorithmic framework encompassing existing methods as special cases, unifying the study of training-free guidance into the analysis of an algorithm-agnostic design space.
arXiv Detail & Related papers (2024-09-24T05:31:17Z)
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think [53.2706196341054]
We show that the perceived inefficiency was caused by a flaw in the inference pipeline that has so far gone unnoticed. We perform end-to-end fine-tuning on top of the single-step model with task-specific losses and get a deterministic model that outperforms all other diffusion-based depth and normal estimation models.
arXiv Detail & Related papers (2024-09-17T16:58:52Z)
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts [49.394145046409044]
This paper provides the first provably efficient technique for pruning experts in finetuned MoE models. We theoretically prove that prioritizing the pruning of the experts with a smaller change of the routers l2 norm from the pretrained model guarantees the preservation of test accuracy. Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models.
arXiv Detail & Related papers (2024-05-26T17:52:58Z)
Distill-then-prune: An Efficient Compression Framework for Real-time Stereo Matching Network on Edge Devices [5.696239274365031]
We propose a novel strategy by incorporating knowledge distillation and model pruning to overcome the inherent trade-off between speed and accuracy. We obtained a model that maintains real-time performance while delivering high accuracy on edge devices.
arXiv Detail & Related papers (2024-05-20T06:03:55Z)
Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives [17.10165955576643]
Current state-of-the-art empirical techniques offer sub-optimal performance on practical, non-decomposable performance objectives. We propose SelMix, a selective mixup-based inexpensive fine-tuning technique for pre-trained models. We find that proposed SelMix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks.
arXiv Detail & Related papers (2024-03-27T06:55:23Z)
Majority Kernels: An Approach to Leverage Big Model Dynamics for Efficient Small Model Training [32.154166415680066]
Methods like distillation, compression or quantization help leverage the highly performant large models to induce smaller performant ones. This paper explores the hypothesis that a single training run can simultaneously train a larger model for performance and derive a smaller model for deployment.
arXiv Detail & Related papers (2024-02-07T17:07:41Z)
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z)
HCE: Improving Performance and Efficiency with Heterogeneously Compressed Neural Network Ensemble [22.065904428696353]
Recent ensemble training method explores different training algorithms or settings on multiple sub-models with the same model architecture. We propose Heterogeneously Compressed Ensemble (HCE), where we build an efficient ensemble with the pruned and quantized variants from a pretrained DNN model.
arXiv Detail & Related papers (2023-01-18T21:47:05Z)
Follow Your Path: a Progressive Method for Knowledge Distillation [23.709919521355936]
We propose ProKT, a new model-agnostic method by projecting the supervision signals of a teacher model into the student's parameter space. Experiments on both image and text datasets show that our proposed ProKT consistently achieves superior performance compared to other existing knowledge distillation methods.
arXiv Detail & Related papers (2021-07-20T07:44:33Z)
MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability. We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error. Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
Performance of Hyperbolic Geometry Models on Top-N Recommendation Tasks [72.62702932371148]
We introduce a simple autoencoder based on hyperbolic geometry for solving standard collaborative filtering problem. In contrast to many modern deep learning techniques, we build our solution using only a single hidden layer.
arXiv Detail & Related papers (2020-08-15T13:21:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.