Related papers: Revisit Visual Representation in Analytics Taxonomy: A Compression Perspective

Revisit Visual Representation in Analytics Taxonomy: A Compression Perspective

URL: http://arxiv.org/abs/2106.08512v1
Date: Wed, 16 Jun 2021 01:44:32 GMT
Title: Revisit Visual Representation in Analytics Taxonomy: A Compression Perspective
Authors: Yueyu Hu, Wenhan Yang, Haofeng Huang, Jiaying Liu
Abstract summary: We study the problem of supporting multiple machine vision analytics tasks with the compressed visual representation. By utilizing the intrinsic transferability among different tasks, our framework successfully constructs compact and expressive representations at low bit-rates. In order to impose compactness in the representations, we propose a codebook-based hyperprior.
Score: 69.99087941471882
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual analytics have played an increasingly critical role in the Internet of Things, where massive visual signals have to be compressed and fed into machines. But facing such big data and constrained bandwidth capacity, existing image/video compression methods lead to very low-quality representations, while existing feature compression techniques fail to support diversified visual analytics applications/tasks with low-bit-rate representations. In this paper, we raise and study the novel problem of supporting multiple machine vision analytics tasks with the compressed visual representation, namely, the information compression problem in analytics taxonomy. By utilizing the intrinsic transferability among different tasks, our framework successfully constructs compact and expressive representations at low bit-rates to support a diversified set of machine vision tasks, including both high-level semantic-related tasks and mid-level geometry analytic tasks. In order to impose compactness in the representations, we propose a codebook-based hyperprior, which helps map the representation into a low-dimensional manifold. As it well fits the signal structure of the deep visual feature, it facilitates more accurate entropy estimation, and results in higher compression efficiency. With the proposed framework and the codebook-based hyperprior, we further investigate the relationship of different task features owning different levels of abstraction granularity. Experimental results demonstrate that with the proposed scheme, a set of diversified tasks can be supported at a significantly lower bit-rate, compared with existing compression schemes.

Related papers

"Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space. Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z)
Contextual Reinforcement in Multimodal Token Compression for Large Language Models [0.0]
token compression remains a critical challenge for scaling models to handle increasingly complex and diverse datasets. A novel mechanism based on contextual reinforcement is introduced, dynamically adjusting token importance through interdependencies and semantic relevance. This approach enables substantial reductions in token usage while preserving the quality and coherence of information representation.
arXiv Detail & Related papers (2025-01-28T02:44:31Z)
Scalable Face Image Coding via StyleGAN Prior: Towards Compression for Human-Machine Collaborative Vision [39.50768518548343]
We investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision. Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers. Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio.
arXiv Detail & Related papers (2023-12-25T05:57:23Z)
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning [19.28860833813788]
Existing models commonly train a visual encoder with weak cross-modal supervision signals. We propose a novel textbfVisually-textbfAsymmetric cotextbfNsistentextbfCy textbfLearning (textscVancl) approach to capture fine-grained visual and layout features.
arXiv Detail & Related papers (2023-10-23T10:37:22Z)
Machine Perception-Driven Image Compression: A Layered Generative Approach [32.23554195427311]
layered generative image compression model is proposed to achieve high human vision-oriented image reconstructed quality. Task-agnostic learning-based compression model is proposed, which effectively supports various compressed domain-based analytical tasks. Joint optimization schedule is adopted to acquire best balance point among compression ratio, reconstructed image quality, and downstream perception performance.
arXiv Detail & Related papers (2023-04-14T02:12:38Z)
Semantic-aware Texture-Structure Feature Collaboration for Underwater Image Enhancement [58.075720488942125]
Underwater image enhancement has become an attractive topic as a significant technology in marine engineering and aquatic robotics. We develop an efficient and compact enhancement network in collaboration with a high-level semantic-aware pretrained model. We also apply the proposed algorithm to the underwater salient object detection task to reveal the favorable semantic-aware ability for high-level vision tasks.
arXiv Detail & Related papers (2022-11-19T07:50:34Z)
Video Coding for Machine: Compact Visual Representation Compression for Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression. This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z)
Single Image Deraining via Scale-space Invariant Attention Neural Network [58.5284246878277]
We tackle the notion of scale that deals with visual changes in appearance of rain steaks with respect to the camera. We propose to represent the multi-scale correlation in convolutional feature domain, which is more compact and robust than that in pixel domain. In this way, we summarize the most activated presence of feature maps as the salient features.
arXiv Detail & Related papers (2020-06-09T04:59:26Z)
Towards Analysis-friendly Face Representation with Scalable Feature and Texture Compression [113.30411004622508]
We show that a universal and collaborative visual information representation can be achieved in a hierarchical way. Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction. To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner.
arXiv Detail & Related papers (2020-04-21T14:32:49Z)
End-to-End Facial Deep Learning Feature Compression with Teacher-Student Enhancement [57.18801093608717]
We propose a novel end-to-end feature compression scheme by leveraging the representation and learning capability of deep neural networks. In particular, the extracted features are compactly coded in an end-to-end manner by optimizing the rate-distortion cost. We verify the effectiveness of the proposed model with the facial feature, and experimental results reveal better compression performance in terms of rate-accuracy.
arXiv Detail & Related papers (2020-02-10T10:08:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.