End-to-end Compression Towards Machine Vision: Network Architecture
Design and Optimization
- URL: http://arxiv.org/abs/2107.00328v1
- Date: Thu, 1 Jul 2021 09:36:32 GMT
- Title: End-to-end Compression Towards Machine Vision: Network Architecture
Design and Optimization
- Authors: Shurun Wang, Zhao Wang, Shiqi Wang, Yan Ye
- Abstract summary: We show that the design and optimization of network architecture could be further improved for compression towards machine vision.
We propose an inverted bottleneck structure for end-to-end compression towards machine vision.
We show that the proposed scheme achieves significant BD-rate savings in terms of analysis performance.
- Score: 21.64194147628761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The research of visual signal compression has a long history. Fueled by deep
learning, exciting progress has been made recently. Despite achieving better
compression performance, existing end-to-end compression algorithms are still
designed towards better signal quality in terms of rate-distortion
optimization. In this paper, we show that the design and optimization of
network architecture could be further improved for compression towards machine
vision. We propose an inverted bottleneck structure for end-to-end compression
towards machine vision, which specifically accounts for efficient
representation of the semantic information. Moreover, we quest the capability
of optimization by incorporating the analytics accuracy into the optimization
process, and the optimality is further explored with generalized rate-accuracy
optimization in an iterative manner. We use object detection as a showcase for
end-to-end compression towards machine vision, and extensive experiments show
that the proposed scheme achieves significant BD-rate savings in terms of
analysis performance. Moreover, the promise of the scheme is also demonstrated
with strong generalization capability towards other machine vision tasks, due
to the enabling of signal-level reconstruction.
Related papers
- Scalable Face Image Coding via StyleGAN Prior: Towards Compression for
Human-Machine Collaborative Vision [39.50768518548343]
We investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision.
Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers.
Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio.
arXiv Detail & Related papers (2023-12-25T05:57:23Z) - Learning Accurate Performance Predictors for Ultrafast Automated Model
Compression [86.22294249097203]
We propose an ultrafast automated model compression framework called SeerNet for flexible network deployment.
Our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
arXiv Detail & Related papers (2023-04-13T10:52:49Z) - Towards Optimal Compression: Joint Pruning and Quantization [1.191194620421783]
This paper introduces FITCompress, a novel method integrating layer-wise mixed-precision quantization and unstructured pruning.
Experiments on computer vision and natural language processing benchmarks demonstrate that our proposed approach achieves a superior compression-performance trade-off.
arXiv Detail & Related papers (2023-02-15T12:02:30Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - Revisit Visual Representation in Analytics Taxonomy: A Compression
Perspective [69.99087941471882]
We study the problem of supporting multiple machine vision analytics tasks with the compressed visual representation.
By utilizing the intrinsic transferability among different tasks, our framework successfully constructs compact and expressive representations at low bit-rates.
In order to impose compactness in the representations, we propose a codebook-based hyperprior.
arXiv Detail & Related papers (2021-06-16T01:44:32Z) - Thousand to One: Semantic Prior Modeling for Conceptual Coding [26.41657489930382]
We propose an end-to-end semantic prior-based conceptual coding scheme towards extremely low image compression.
We employ semantic segmentation maps as structural guidance for extracting deep semantic prior.
A cross-channel entropy model is proposed to further exploit the inter-channel correlation of the spatially independent semantic prior.
arXiv Detail & Related papers (2021-03-12T08:02:07Z) - Towards Analysis-friendly Face Representation with Scalable Feature and
Texture Compression [113.30411004622508]
We show that a universal and collaborative visual information representation can be achieved in a hierarchical way.
Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction.
To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner.
arXiv Detail & Related papers (2020-04-21T14:32:49Z) - Learning End-to-End Lossy Image Compression: A Benchmark [90.35363142246806]
We first conduct a comprehensive literature survey of learned image compression methods.
We describe milestones in cutting-edge learned image-compression methods, review a broad range of existing works, and provide insights into their historical development routes.
By introducing a coarse-to-fine hyperprior model for entropy estimation and signal reconstruction, we achieve improved rate-distortion performance.
arXiv Detail & Related papers (2020-02-10T13:13:43Z) - End-to-End Facial Deep Learning Feature Compression with Teacher-Student
Enhancement [57.18801093608717]
We propose a novel end-to-end feature compression scheme by leveraging the representation and learning capability of deep neural networks.
In particular, the extracted features are compactly coded in an end-to-end manner by optimizing the rate-distortion cost.
We verify the effectiveness of the proposed model with the facial feature, and experimental results reveal better compression performance in terms of rate-accuracy.
arXiv Detail & Related papers (2020-02-10T10:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.