A Multi-task Supervised Compression Model for Split Computing
- URL: http://arxiv.org/abs/2501.01420v2
- Date: Sun, 19 Jan 2025 06:56:18 GMT
- Title: A Multi-task Supervised Compression Model for Split Computing
- Authors: Yoshitomo Matsubara, Matteo Mendula, Marco Levorato,
- Abstract summary: Split computing is a promising approach to deep learning models for resource-constrained edge computing systems.
We propose Ladon, the first multi-task-head supervised compression model for multi-task split computing.
Our models reduced end-to-end latency (by up to 95.4%) and energy consumption of mobile devices (by up to 88.2%) in multi-task split computing scenarios.
- Score: 4.234757989234096
- License:
- Abstract: Split computing ($\neq$ split learning) is a promising approach to deep learning models for resource-constrained edge computing systems, where weak sensor (mobile) devices are wirelessly connected to stronger edge servers through channels with limited communication capacity. State-of-theart work on split computing presents methods for single tasks such as image classification, object detection, or semantic segmentation. The application of existing methods to multitask problems degrades model accuracy and/or significantly increase runtime latency. In this study, we propose Ladon, the first multi-task-head supervised compression model for multi-task split computing. Experimental results show that the multi-task supervised compression model either outperformed or rivaled strong lightweight baseline models in terms of predictive performance for ILSVRC 2012, COCO 2017, and PASCAL VOC 2012 datasets while learning compressed representations at its early layers. Furthermore, our models reduced end-to-end latency (by up to 95.4%) and energy consumption of mobile devices (by up to 88.2%) in multi-task split computing scenarios.
Related papers
- Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition [2.6820435361998918]
We propose a post-training Tucker factorization that enables the decomposition of pre-existing models to reduce computational requirements without impeding segmentation accuracy.
Our approach reduced the floating-point operations (FLOPs) and memory required during inference, offering an adjustable trade-off between computational efficiency and segmentation quality.
arXiv Detail & Related papers (2024-04-15T11:36:31Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Partitioning Distributed Compute Jobs with Reinforcement Learning and
Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields.
Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices.
We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z) - DiSparse: Disentangled Sparsification for Multitask Model Compression [92.84435347164435]
DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.
Our experimental results demonstrate superior performance on various configurations and settings.
arXiv Detail & Related papers (2022-06-09T17:57:46Z) - Feature Compression for Rate Constrained Object Detection on the Edge [20.18227104333772]
An emerging approach to solve this problem is to offload the computation of neural networks to computing resources at an edge server.
In this work, we consider a "split computation" system to offload a part of the computation of the YOLO object detection model.
We train the feature compression and decompression module together with the YOLO model to optimize the object detection accuracy under a rate constraint.
arXiv Detail & Related papers (2022-04-15T03:39:30Z) - SC2 Benchmark: Supervised Compression for Split Computing [21.7175821221294]
This study introduces supervised compression for split computing (SC2) and proposes new evaluation criteria.
We conduct a comprehensive benchmark study using 10 baseline methods, three computer vision tasks, and over 180 trained models.
Our proposed metrics and package will help researchers better understand the tradeoffs of supervised compression in split computing.
arXiv Detail & Related papers (2022-03-16T18:43:18Z) - Federated Action Recognition on Heterogeneous Embedded Devices [16.88104153104136]
In this work, we enable clients with limited computing power to perform action recognition, a computationally heavy task.
We first perform model compression at the central server through knowledge distillation on a large dataset.
The fine-tuning is required because limited data present in smaller datasets is not adequate for action recognition models to learn complextemporal features.
arXiv Detail & Related papers (2021-07-18T02:33:24Z) - Multi-task Over-the-Air Federated Learning: A Non-Orthogonal
Transmission Approach [52.85647632037537]
We propose a multi-task over-theair federated learning (MOAFL) framework, where multiple learning tasks share edge devices for data collection and learning models under the coordination of a edge server (ES)
Both the convergence analysis and numerical results demonstrate that the MOAFL framework can significantly reduce the uplink bandwidth consumption of multiple tasks without causing substantial learning performance degradation.
arXiv Detail & Related papers (2021-06-27T13:09:32Z) - Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of
Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models.
This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.