Revisit Visual Representation in Analytics Taxonomy: A Compression
Perspective
- URL: http://arxiv.org/abs/2106.08512v1
- Date: Wed, 16 Jun 2021 01:44:32 GMT
- Title: Revisit Visual Representation in Analytics Taxonomy: A Compression
Perspective
- Authors: Yueyu Hu, Wenhan Yang, Haofeng Huang, Jiaying Liu
- Abstract summary: We study the problem of supporting multiple machine vision analytics tasks with the compressed visual representation.
By utilizing the intrinsic transferability among different tasks, our framework successfully constructs compact and expressive representations at low bit-rates.
In order to impose compactness in the representations, we propose a codebook-based hyperprior.
- Score: 69.99087941471882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual analytics have played an increasingly critical role in the Internet of
Things, where massive visual signals have to be compressed and fed into
machines. But facing such big data and constrained bandwidth capacity, existing
image/video compression methods lead to very low-quality representations, while
existing feature compression techniques fail to support diversified visual
analytics applications/tasks with low-bit-rate representations. In this paper,
we raise and study the novel problem of supporting multiple machine vision
analytics tasks with the compressed visual representation, namely, the
information compression problem in analytics taxonomy. By utilizing the
intrinsic transferability among different tasks, our framework successfully
constructs compact and expressive representations at low bit-rates to support a
diversified set of machine vision tasks, including both high-level
semantic-related tasks and mid-level geometry analytic tasks. In order to
impose compactness in the representations, we propose a codebook-based
hyperprior, which helps map the representation into a low-dimensional manifold.
As it well fits the signal structure of the deep visual feature, it facilitates
more accurate entropy estimation, and results in higher compression efficiency.
With the proposed framework and the codebook-based hyperprior, we further
investigate the relationship of different task features owning different levels
of abstraction granularity. Experimental results demonstrate that with the
proposed scheme, a set of diversified tasks can be supported at a significantly
lower bit-rate, compared with existing compression schemes.
Related papers
- Scalable Face Image Coding via StyleGAN Prior: Towards Compression for
Human-Machine Collaborative Vision [39.50768518548343]
We investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision.
Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers.
Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio.
arXiv Detail & Related papers (2023-12-25T05:57:23Z) - Vision-Enhanced Semantic Entity Recognition in Document Images via
Visually-Asymmetric Consistency Learning [19.28860833813788]
Existing models commonly train a visual encoder with weak cross-modal supervision signals.
We propose a novel textbfVisually-textbfAsymmetric cotextbfNsistentextbfCy textbfLearning (textscVancl) approach to capture fine-grained visual and layout features.
arXiv Detail & Related papers (2023-10-23T10:37:22Z) - Machine Perception-Driven Image Compression: A Layered Generative
Approach [32.23554195427311]
layered generative image compression model is proposed to achieve high human vision-oriented image reconstructed quality.
Task-agnostic learning-based compression model is proposed, which effectively supports various compressed domain-based analytical tasks.
Joint optimization schedule is adopted to acquire best balance point among compression ratio, reconstructed image quality, and downstream perception performance.
arXiv Detail & Related papers (2023-04-14T02:12:38Z) - Semantic-aware Texture-Structure Feature Collaboration for Underwater
Image Enhancement [58.075720488942125]
Underwater image enhancement has become an attractive topic as a significant technology in marine engineering and aquatic robotics.
We develop an efficient and compact enhancement network in collaboration with a high-level semantic-aware pretrained model.
We also apply the proposed algorithm to the underwater salient object detection task to reveal the favorable semantic-aware ability for high-level vision tasks.
arXiv Detail & Related papers (2022-11-19T07:50:34Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - Single Image Deraining via Scale-space Invariant Attention Neural
Network [58.5284246878277]
We tackle the notion of scale that deals with visual changes in appearance of rain steaks with respect to the camera.
We propose to represent the multi-scale correlation in convolutional feature domain, which is more compact and robust than that in pixel domain.
In this way, we summarize the most activated presence of feature maps as the salient features.
arXiv Detail & Related papers (2020-06-09T04:59:26Z) - Towards Analysis-friendly Face Representation with Scalable Feature and
Texture Compression [113.30411004622508]
We show that a universal and collaborative visual information representation can be achieved in a hierarchical way.
Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction.
To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner.
arXiv Detail & Related papers (2020-04-21T14:32:49Z) - End-to-End Facial Deep Learning Feature Compression with Teacher-Student
Enhancement [57.18801093608717]
We propose a novel end-to-end feature compression scheme by leveraging the representation and learning capability of deep neural networks.
In particular, the extracted features are compactly coded in an end-to-end manner by optimizing the rate-distortion cost.
We verify the effectiveness of the proposed model with the facial feature, and experimental results reveal better compression performance in terms of rate-accuracy.
arXiv Detail & Related papers (2020-02-10T10:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.