Related papers: Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution

Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution

URL: http://arxiv.org/abs/2410.12165v2
Date: Sun, 20 Oct 2024 17:59:09 GMT
Title: Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution
Authors: Timothy Wei, Hsien Xin Peng, Elaine Xu, Bryan Zhao, Lei Ding, Diji Yang,
Abstract summary: We design a hybrid edge-cloud solution that leverages the efficiency of smaller models for local processing while deferring to larger, more accurate cloud-based models when necessary. Specifically, we propose a novel unsupervised data generation method, Dual-Model Distillation (DMD), to train a lightweight switcher model that can predict when the edge model's output is uncertain. Experimental results on the action classification task show that our framework not only requires less computational overhead, but also improves accuracy compared to using a large model alone.
Score: 1.8029479474051309
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As Artificial Intelligence models, such as Large Video-Language models (VLMs), grow in size, their deployment in real-world applications becomes increasingly challenging due to hardware limitations and computational costs. To address this, we design a hybrid edge-cloud solution that leverages the efficiency of smaller models for local processing while deferring to larger, more accurate cloud-based models when necessary. Specifically, we propose a novel unsupervised data generation method, Dual-Model Distillation (DMD), to train a lightweight switcher model that can predict when the edge model's output is uncertain and selectively offload inference to the large model in the cloud. Experimental results on the action classification task show that our framework not only requires less computational overhead, but also improves accuracy compared to using a large model alone. Our framework provides a scalable and adaptable solution for action classification in resource-constrained environments, with potential applications beyond healthcare. Noteworthy, while DMD-generated data is used for optimizing performance and resource usage in our pipeline, we expect the concept of DMD to further support future research on knowledge alignment across multiple models.

Related papers

ScaleDL: Towards Scalable and Efficient Runtime Prediction for Distributed Deep Learning Workloads [14.876533021201539]
ScaleDL is a novel runtime prediction framework for deep neural networks (DNNs)<n>It combines nonlinear layer-wise modeling with graph neural network (GNN)-based cross-layer interaction mechanism.<n>Experiments show that ScaleDL enhances runtime prediction accuracy and generalizability, achieving 6 times lower MRE and 5 times lower RMSE compared to baseline models.
arXiv Detail & Related papers (2025-11-06T08:05:55Z)
Compute-Optimal Scaling for Value-Based Deep RL [99.680827753493]
We investigate compute scaling for online, value-based deep RL.<n>Our analysis reveals a nuanced interplay between model size, batch size, and UTD.<n>We provide a mental model for understanding this phenomenon and build guidelines for choosing batch size and UTD.
arXiv Detail & Related papers (2025-08-20T17:54:21Z)
Efficient Ternary Weight Embedding Model: Bridging Scalability and Performance [15.877771709013743]
In this work, we propose a novel finetuning framework to ternary-weight embedding models. To apply ternarization to pre-trained embedding models, we introduce self-taught knowledge distillation to finalize the ternary-weights of the linear layers. With extensive experiments on public text and vision datasets, we demonstrated that without sacrificing effectiveness, the ternarized model consumes low memory usage.
arXiv Detail & Related papers (2024-11-23T03:44:56Z)
Adaptable Embeddings Network (AEN) [49.1574468325115]
We introduce Adaptable Embeddings Networks (AEN), a novel dual-encoder architecture using Kernel Density Estimation (KDE) AEN allows for runtime adaptation of classification criteria without retraining and is non-autoregressive. The architecture's ability to preprocess and cache condition embeddings makes it ideal for edge computing applications and real-time monitoring systems.
arXiv Detail & Related papers (2024-11-21T02:15:52Z)
Efficient Point Cloud Classification via Offline Distillation Framework and Negative-Weight Self-Distillation Technique [46.266960248570086]
We introduce an innovative offline recording strategy that avoids the simultaneous loading of both teacher and student models. This approach feeds a multitude of augmented samples into the teacher model, recording both the data augmentation parameters and the corresponding logit outputs. Experimental results demonstrate that the proposed distillation strategy enables the student model to achieve performance comparable to state-of-the-art models.
arXiv Detail & Related papers (2024-09-03T16:12:12Z)
Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches [64.42735183056062]
Large language models (LLMs) have transitioned from specialized models to versatile foundation models. LLMs exhibit impressive zero-shot ability, however, require fine-tuning on local datasets and significant resources for deployment.
arXiv Detail & Related papers (2024-08-20T09:42:17Z)
Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development [67.55944651679864]
We present a novel sandbox suite tailored for integrated data-model co-development. This sandbox provides a comprehensive experimental platform, enabling rapid iteration and insight-driven refinement of both data and models. We also uncover fruitful insights gleaned from exhaustive benchmarks, shedding light on the critical interplay between data quality, diversity, and model behavior.
arXiv Detail & Related papers (2024-07-16T14:40:07Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z)
ECLM: Efficient Edge-Cloud Collaborative Learning with Continuous Environment Adaptation [47.35179593006409]
We propose ECLM, an edge-cloud collaborative learning framework for rapid model adaptation for dynamic edge environments. We show that ECLM significantly improves model performance (e.g., 18.89% accuracy increase) and resource efficiency (e.g. 7.12x communication cost reduction) in adapting models to dynamic edge environments.
arXiv Detail & Related papers (2023-11-18T14:10:09Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
Real-time Human Detection Model for Edge Devices [0.0]
Convolutional Neural Networks (CNNs) have replaced traditional feature extraction and machine learning models in detection and classification tasks. Lightweight CNN models have been recently introduced for real-time tasks. This paper suggests a CNN-based lightweight model that can fit on a limited edge device such as Raspberry Pi.
arXiv Detail & Related papers (2021-11-20T18:42:17Z)
OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless Compression [49.10945855716001]
We propose a novel setting that starts from a pretrained deep generative model and compresses the data batches while adapting the model with a dynamical system for only one epoch. Experimental results show that vanilla OSOA can save significant time versus training bespoke models and space versus using one model for all targets.
arXiv Detail & Related papers (2021-11-02T15:18:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.