SC2 Benchmark: Supervised Compression for Split Computing
- URL: http://arxiv.org/abs/2203.08875v2
- Date: Wed, 14 Jun 2023 17:59:07 GMT
- Title: SC2 Benchmark: Supervised Compression for Split Computing
- Authors: Yoshitomo Matsubara, Ruihan Yang, Marco Levorato, Stephan Mandt
- Abstract summary: This study introduces supervised compression for split computing (SC2) and proposes new evaluation criteria.
We conduct a comprehensive benchmark study using 10 baseline methods, three computer vision tasks, and over 180 trained models.
Our proposed metrics and package will help researchers better understand the tradeoffs of supervised compression in split computing.
- Score: 21.7175821221294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing demand for deep learning models on mobile devices,
splitting neural network computation between the device and a more powerful
edge server has become an attractive solution. However, existing split
computing approaches often underperform compared to a naive baseline of remote
computation on compressed data. Recent studies propose learning compressed
representations that contain more relevant information for supervised
downstream tasks, showing improved tradeoffs between compressed data size and
supervised performance. However, existing evaluation metrics only provide an
incomplete picture of split computing. This study introduces supervised
compression for split computing (SC2) and proposes new evaluation criteria:
minimizing computation on the mobile device, minimizing transmitted data size,
and maximizing model accuracy. We conduct a comprehensive benchmark study using
10 baseline methods, three computer vision tasks, and over 180 trained models,
and discuss various aspects of SC2. We also release sc2bench, a Python package
for future research on SC2. Our proposed metrics and package will help
researchers better understand the tradeoffs of supervised compression in split
computing.
Related papers
- A Multi-task Supervised Compression Model for Split Computing [4.234757989234096]
Split computing is a promising approach to deep learning models for resource-constrained edge computing systems.
We propose Ladon, the first multi-task-head supervised compression model for multi-task split computing.
Our models reduced end-to-end latency (by up to 95.4%) and energy consumption of mobile devices (by up to 88.2%) in multi-task split computing scenarios.
arXiv Detail & Related papers (2025-01-02T18:59:05Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Neural Architecture Search for Improving Latency-Accuracy Trade-off in
Split Computing [5.516431145236317]
Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems.
In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks.
This paper proposes a neural architecture search (NAS) method for split computing.
arXiv Detail & Related papers (2022-08-30T03:15:43Z) - Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification.
The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z) - Feature Compression for Rate Constrained Object Detection on the Edge [20.18227104333772]
An emerging approach to solve this problem is to offload the computation of neural networks to computing resources at an edge server.
In this work, we consider a "split computation" system to offload a part of the computation of the YOLO object detection model.
We train the feature compression and decompression module together with the YOLO model to optimize the object detection accuracy under a rate constraint.
arXiv Detail & Related papers (2022-04-15T03:39:30Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Supervised Compression for Resource-constrained Edge Computing Systems [26.676557573171618]
Full-scale deep neural networks are often too resource-intensive in terms of energy and storage.
This paper adopts ideas from knowledge distillation and neural image compression to compress intermediate feature representations more efficiently.
It achieves better supervised rate-distortion performance while also maintaining smaller end-to-end latency.
arXiv Detail & Related papers (2021-08-21T11:10:29Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z) - Split Computing for Complex Object Detectors: Challenges and Preliminary
Results [8.291242737118482]
We discuss the challenges in developing split computing methods for powerful R-CNN object detectors trained on a large dataset, COCO 2017.
We show that naive split computing methods would not reduce inference time.
This is the first study to inject small bottlenecks to such object detectors and unveil the potential of a split computing approach.
arXiv Detail & Related papers (2020-07-27T05:03:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.