Scalable Face Image Coding via StyleGAN Prior: Towards Compression for
Human-Machine Collaborative Vision
- URL: http://arxiv.org/abs/2312.15622v1
- Date: Mon, 25 Dec 2023 05:57:23 GMT
- Title: Scalable Face Image Coding via StyleGAN Prior: Towards Compression for
Human-Machine Collaborative Vision
- Authors: Qi Mao, Chongyu Wang, Meng Wang, Shiqi Wang, Ruijie Chen, Libiao Jin,
Siwei Ma
- Abstract summary: We investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision.
Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers.
Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio.
- Score: 39.50768518548343
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The accelerated proliferation of visual content and the rapid development of
machine vision technologies bring significant challenges in delivering visual
data on a gigantic scale, which shall be effectively represented to satisfy
both human and machine requirements. In this work, we investigate how
hierarchical representations derived from the advanced generative prior
facilitate constructing an efficient scalable coding paradigm for human-machine
collaborative vision. Our key insight is that by exploiting the StyleGAN prior,
we can learn three-layered representations encoding hierarchical semantics,
which are elaborately designed into the basic, middle, and enhanced layers,
supporting machine intelligence and human visual perception in a progressive
fashion. With the aim of achieving efficient compression, we propose the
layer-wise scalable entropy transformer to reduce the redundancy between
layers. Based on the multi-task scalable rate-distortion objective, the
proposed scheme is jointly optimized to achieve optimal machine analysis
performance, human perception experience, and compression ratio. We validate
the proposed paradigm's feasibility in face image compression. Extensive
qualitative and quantitative experimental results demonstrate the superiority
of the proposed paradigm over the latest compression standard Versatile Video
Coding (VVC) in terms of both machine analysis as well as human perception at
extremely low bitrates ($<0.01$ bpp), offering new insights for human-machine
collaborative compression.
Related papers
- Unifying Generation and Compression: Ultra-low bitrate Image Coding Via
Multi-stage Transformer [35.500720262253054]
This paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression.
A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization.
Experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception.
arXiv Detail & Related papers (2024-03-06T14:27:02Z) - Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient
Neural Image Compression [11.25130799452367]
We propose an absolute image compression transformer (ICT) for neural image compression (NIC)
ICT captures both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents.
Our framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural SwinT-ChARM.
arXiv Detail & Related papers (2023-07-05T13:17:14Z) - Machine Perception-Driven Image Compression: A Layered Generative
Approach [32.23554195427311]
layered generative image compression model is proposed to achieve high human vision-oriented image reconstructed quality.
Task-agnostic learning-based compression model is proposed, which effectively supports various compressed domain-based analytical tasks.
Joint optimization schedule is adopted to acquire best balance point among compression ratio, reconstructed image quality, and downstream perception performance.
arXiv Detail & Related papers (2023-04-14T02:12:38Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - Revisit Visual Representation in Analytics Taxonomy: A Compression
Perspective [69.99087941471882]
We study the problem of supporting multiple machine vision analytics tasks with the compressed visual representation.
By utilizing the intrinsic transferability among different tasks, our framework successfully constructs compact and expressive representations at low bit-rates.
In order to impose compactness in the representations, we propose a codebook-based hyperprior.
arXiv Detail & Related papers (2021-06-16T01:44:32Z) - Towards Analysis-friendly Face Representation with Scalable Feature and
Texture Compression [113.30411004622508]
We show that a universal and collaborative visual information representation can be achieved in a hierarchical way.
Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction.
To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner.
arXiv Detail & Related papers (2020-04-21T14:32:49Z) - End-to-End Facial Deep Learning Feature Compression with Teacher-Student
Enhancement [57.18801093608717]
We propose a novel end-to-end feature compression scheme by leveraging the representation and learning capability of deep neural networks.
In particular, the extracted features are compactly coded in an end-to-end manner by optimizing the rate-distortion cost.
We verify the effectiveness of the proposed model with the facial feature, and experimental results reveal better compression performance in terms of rate-accuracy.
arXiv Detail & Related papers (2020-02-10T10:08:44Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.