Revisit Visual Representation in Analytics Taxonomy: A Compression
Perspective
- URL: http://arxiv.org/abs/2106.08512v1
- Date: Wed, 16 Jun 2021 01:44:32 GMT
- Title: Revisit Visual Representation in Analytics Taxonomy: A Compression
Perspective
- Authors: Yueyu Hu, Wenhan Yang, Haofeng Huang, Jiaying Liu
- Abstract summary: We study the problem of supporting multiple machine vision analytics tasks with the compressed visual representation.
By utilizing the intrinsic transferability among different tasks, our framework successfully constructs compact and expressive representations at low bit-rates.
In order to impose compactness in the representations, we propose a codebook-based hyperprior.
- Score: 69.99087941471882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual analytics have played an increasingly critical role in the Internet of
Things, where massive visual signals have to be compressed and fed into
machines. But facing such big data and constrained bandwidth capacity, existing
image/video compression methods lead to very low-quality representations, while
existing feature compression techniques fail to support diversified visual
analytics applications/tasks with low-bit-rate representations. In this paper,
we raise and study the novel problem of supporting multiple machine vision
analytics tasks with the compressed visual representation, namely, the
information compression problem in analytics taxonomy. By utilizing the
intrinsic transferability among different tasks, our framework successfully
constructs compact and expressive representations at low bit-rates to support a
diversified set of machine vision tasks, including both high-level
semantic-related tasks and mid-level geometry analytic tasks. In order to
impose compactness in the representations, we propose a codebook-based
hyperprior, which helps map the representation into a low-dimensional manifold.
As it well fits the signal structure of the deep visual feature, it facilitates
more accurate entropy estimation, and results in higher compression efficiency.
With the proposed framework and the codebook-based hyperprior, we further
investigate the relationship of different task features owning different levels
of abstraction granularity. Experimental results demonstrate that with the
proposed scheme, a set of diversified tasks can be supported at a significantly
lower bit-rate, compared with existing compression schemes.
Related papers
- Contextual Reinforcement in Multimodal Token Compression for Large Language Models [0.0]
token compression remains a critical challenge for scaling models to handle increasingly complex and diverse datasets.
A novel mechanism based on contextual reinforcement is introduced, dynamically adjusting token importance through interdependencies and semantic relevance.
This approach enables substantial reductions in token usage while preserving the quality and coherence of information representation.
arXiv Detail & Related papers (2025-01-28T02:44:31Z) - Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models [50.98559225639266]
We investigate the contributions of visual features from different encoder layers using 18 benchmarks spanning 6 task categories.
Our findings reveal that multilayer features provide complementary strengths with varying task dependencies, and uniform fusion leads to suboptimal performance.
We propose the instruction-guided vision aggregator, a module that dynamically integrates multi-layer visual features based on textual instructions.
arXiv Detail & Related papers (2024-12-26T05:41:31Z) - Scalable Face Image Coding via StyleGAN Prior: Towards Compression for
Human-Machine Collaborative Vision [39.50768518548343]
We investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision.
Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers.
Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio.
arXiv Detail & Related papers (2023-12-25T05:57:23Z) - Machine Perception-Driven Image Compression: A Layered Generative
Approach [32.23554195427311]
layered generative image compression model is proposed to achieve high human vision-oriented image reconstructed quality.
Task-agnostic learning-based compression model is proposed, which effectively supports various compressed domain-based analytical tasks.
Joint optimization schedule is adopted to acquire best balance point among compression ratio, reconstructed image quality, and downstream perception performance.
arXiv Detail & Related papers (2023-04-14T02:12:38Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - Single Image Deraining via Scale-space Invariant Attention Neural
Network [58.5284246878277]
We tackle the notion of scale that deals with visual changes in appearance of rain steaks with respect to the camera.
We propose to represent the multi-scale correlation in convolutional feature domain, which is more compact and robust than that in pixel domain.
In this way, we summarize the most activated presence of feature maps as the salient features.
arXiv Detail & Related papers (2020-06-09T04:59:26Z) - Towards Analysis-friendly Face Representation with Scalable Feature and
Texture Compression [113.30411004622508]
We show that a universal and collaborative visual information representation can be achieved in a hierarchical way.
Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction.
To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner.
arXiv Detail & Related papers (2020-04-21T14:32:49Z) - End-to-End Facial Deep Learning Feature Compression with Teacher-Student
Enhancement [57.18801093608717]
We propose a novel end-to-end feature compression scheme by leveraging the representation and learning capability of deep neural networks.
In particular, the extracted features are compactly coded in an end-to-end manner by optimizing the rate-distortion cost.
We verify the effectiveness of the proposed model with the facial feature, and experimental results reveal better compression performance in terms of rate-accuracy.
arXiv Detail & Related papers (2020-02-10T10:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.