A Principled Hierarchical Deep Learning Approach to Joint Image
Compression and Classification
- URL: http://arxiv.org/abs/2310.19675v1
- Date: Mon, 30 Oct 2023 15:52:18 GMT
- Title: A Principled Hierarchical Deep Learning Approach to Joint Image
Compression and Classification
- Authors: Siyu Qi, Achintha Wijesinghe, Lahiru D. Chamain, Zhi Ding
- Abstract summary: This work proposes a three-step joint learning strategy to guide encoders to extract features that are compact, discriminative, and amenable to common augmentations/transformations.
Tests show that our proposed method achieves accuracy improvement of up to 1.5% on CIFAR-10 and 3% on CIFAR-100 over conventional E2E cross-entropy training.
- Score: 27.934109301041595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Among applications of deep learning (DL) involving low cost sensors, remote
image classification involves a physical channel that separates edge sensors
and cloud classifiers. Traditional DL models must be divided between an encoder
for the sensor and the decoder + classifier at the edge server. An important
challenge is to effectively train such distributed models when the connecting
channels have limited rate/capacity. Our goal is to optimize DL models such
that the encoder latent requires low channel bandwidth while still delivers
feature information for high classification accuracy. This work proposes a
three-step joint learning strategy to guide encoders to extract features that
are compact, discriminative, and amenable to common
augmentations/transformations. We optimize latent dimension through an initial
screening phase before end-to-end (E2E) training. To obtain an adjustable bit
rate via a single pre-deployed encoder, we apply entropy-based quantization
and/or manual truncation on the latent representations. Tests show that our
proposed method achieves accuracy improvement of up to 1.5% on CIFAR-10 and 3%
on CIFAR-100 over conventional E2E cross-entropy training.
Related papers
- 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders [53.297697898510194]
We propose a joint modeling scheme where four decoders share the same encoder -- we refer to this as 4D modeling.
To efficiently train the 4D model, we introduce a two-stage training strategy that stabilizes multitask learning.
In addition, we propose three novel one-pass beam search algorithms by combining three decoders.
arXiv Detail & Related papers (2024-06-05T05:18:20Z) - Efficient Transformer Encoders for Mask2Former-style models [57.54752243522298]
ECO-M2F is a strategy to self-select the number of hidden layers in the encoder conditioned on the input image.
The proposed approach reduces expected encoder computational cost while maintaining performance.
It is flexible in architecture configurations, and can be extended beyond the segmentation task to object detection.
arXiv Detail & Related papers (2024-04-23T17:26:34Z) - Accelerating Learnt Video Codecs with Gradient Decay and Layer-wise
Distillation [17.980800481385195]
We present a novel model-agnostic pruning scheme based on gradient decay and adaptive layer-wise distillation.
Results confirm that our method yields up to 65% reduction in MACs and 2x speed-up with less than 0.3dB drop in BD-PSNR.
arXiv Detail & Related papers (2023-12-05T09:26:09Z) - ADS_UNet: A Nested UNet for Histopathology Image Segmentation [1.213915839836187]
We propose ADS UNet, a stage-wise additive training algorithm that incorporates resource-efficient deep supervision in shallower layers.
We demonstrate that ADS_UNet outperforms state-of-the-art Transformer-based models by 1.08 and 0.6 points on CRAG and BCSS datasets.
arXiv Detail & Related papers (2023-04-10T13:08:48Z) - Denoising Diffusion Autoencoders are Unified Self-supervised Learners [58.194184241363175]
This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners.
DDAE has already learned strongly linear-separable representations within its intermediate layers without auxiliary encoders.
Our diffusion-based approach achieves 95.9% and 50.0% linear evaluation accuracies on CIFAR-10 and Tiny-ImageNet.
arXiv Detail & Related papers (2023-03-17T04:20:47Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Prompt Tuning for Parameter-efficient Medical Image Segmentation [79.09285179181225]
We propose and investigate several contributions to achieve a parameter-efficient but effective adaptation for semantic segmentation on two medical imaging datasets.
We pre-train this architecture with a dedicated dense self-supervision scheme based on assignments to online generated prototypes.
We demonstrate that the resulting neural network model is able to attenuate the gap between fully fine-tuned and parameter-efficiently adapted models.
arXiv Detail & Related papers (2022-11-16T21:55:05Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - Lightweight Compression of Intermediate Neural Network Features for
Collaborative Intelligence [32.03465747357384]
In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a lightweight device such as a mobile phone or edge device.
This paper presents a novel lightweight compression technique designed specifically to quantize and compress the features output by the intermediate layer of a split DNN.
arXiv Detail & Related papers (2021-05-15T00:10:12Z) - Encoding Syntactic Knowledge in Transformer Encoder for Intent Detection
and Slot Filling [6.234581622120001]
We propose a novel Transformer encoder-based architecture with syntactical knowledge encoded for intent detection and slot filling.
We encode syntactic knowledge into the Transformer encoder by jointly training it to predict syntactic parse ancestors and part-of-speech of each token via multi-task learning.
arXiv Detail & Related papers (2020-12-21T21:25:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.