Related papers: SC2 Benchmark: Supervised Compression for Split Computing

SC2 Benchmark: Supervised Compression for Split Computing

URL: http://arxiv.org/abs/2203.08875v2
Date: Wed, 14 Jun 2023 17:59:07 GMT
Title: SC2 Benchmark: Supervised Compression for Split Computing
Authors: Yoshitomo Matsubara, Ruihan Yang, Marco Levorato, Stephan Mandt
Abstract summary: This study introduces supervised compression for split computing (SC2) and proposes new evaluation criteria. We conduct a comprehensive benchmark study using 10 baseline methods, three computer vision tasks, and over 180 trained models. Our proposed metrics and package will help researchers better understand the tradeoffs of supervised compression in split computing.
Score: 21.7175821221294
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the increasing demand for deep learning models on mobile devices, splitting neural network computation between the device and a more powerful edge server has become an attractive solution. However, existing split computing approaches often underperform compared to a naive baseline of remote computation on compressed data. Recent studies propose learning compressed representations that contain more relevant information for supervised downstream tasks, showing improved tradeoffs between compressed data size and supervised performance. However, existing evaluation metrics only provide an incomplete picture of split computing. This study introduces supervised compression for split computing (SC2) and proposes new evaluation criteria: minimizing computation on the mobile device, minimizing transmitted data size, and maximizing model accuracy. We conduct a comprehensive benchmark study using 10 baseline methods, three computer vision tasks, and over 180 trained models, and discuss various aspects of SC2. We also release sc2bench, a Python package for future research on SC2. Our proposed metrics and package will help researchers better understand the tradeoffs of supervised compression in split computing.

Related papers

A Multi-task Supervised Compression Model for Split Computing [4.234757989234096]
Split computing is a promising approach to deep learning models for resource-constrained edge computing systems. We propose Ladon, the first multi-task-head supervised compression model for multi-task split computing. Our models reduced end-to-end latency (by up to 95.4%) and energy consumption of mobile devices (by up to 88.2%) in multi-task split computing scenarios.
arXiv Detail & Related papers (2025-01-02T18:59:05Z)
Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression [10.233937665979694]
DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices. We introduce a method that employs error-bounded lossy compression to reduce the communication data size and accelerate DLRM training.
arXiv Detail & Related papers (2024-07-05T05:55:18Z)
Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence. We find that gradients require milder compression rates than activations. Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z)
Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets. DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Neural Architecture Search for Improving Latency-Accuracy Trade-off in Split Computing [5.516431145236317]
Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems. In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks. This paper proposes a neural architecture search (NAS) method for split computing.
arXiv Detail & Related papers (2022-08-30T03:15:43Z)
Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification. The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z)
Feature Compression for Rate Constrained Object Detection on the Edge [20.18227104333772]
An emerging approach to solve this problem is to offload the computation of neural networks to computing resources at an edge server. In this work, we consider a "split computation" system to offload a part of the computation of the YOLO object detection model. We train the feature compression and decompression module together with the YOLO model to optimize the object detection accuracy under a rate constraint.
arXiv Detail & Related papers (2022-04-15T03:39:30Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Supervised Compression for Resource-constrained Edge Computing Systems [26.676557573171618]
Full-scale deep neural networks are often too resource-intensive in terms of energy and storage. This paper adopts ideas from knowledge distillation and neural image compression to compress intermediate feature representations more efficiently. It achieves better supervised rate-distortion performance while also maintaining smaller end-to-end latency.
arXiv Detail & Related papers (2021-08-21T11:10:29Z)
PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers. Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z)
Split Computing for Complex Object Detectors: Challenges and Preliminary Results [8.291242737118482]
We discuss the challenges in developing split computing methods for powerful R-CNN object detectors trained on a large dataset, COCO 2017. We show that naive split computing methods would not reduce inference time. This is the first study to inject small bottlenecks to such object detectors and unveil the potential of a split computing approach.
arXiv Detail & Related papers (2020-07-27T05:03:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.