Related papers: Scalable Face Image Coding via StyleGAN Prior: Towards Compression for Human-Machine Collaborative Vision

Scalable Face Image Coding via StyleGAN Prior: Towards Compression for Human-Machine Collaborative Vision

URL: http://arxiv.org/abs/2312.15622v1
Date: Mon, 25 Dec 2023 05:57:23 GMT
Title: Scalable Face Image Coding via StyleGAN Prior: Towards Compression for Human-Machine Collaborative Vision
Authors: Qi Mao, Chongyu Wang, Meng Wang, Shiqi Wang, Ruijie Chen, Libiao Jin, Siwei Ma
Abstract summary: We investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision. Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers. Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio.
Score: 39.50768518548343
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The accelerated proliferation of visual content and the rapid development of machine vision technologies bring significant challenges in delivering visual data on a gigantic scale, which shall be effectively represented to satisfy both human and machine requirements. In this work, we investigate how hierarchical representations derived from the advanced generative prior facilitate constructing an efficient scalable coding paradigm for human-machine collaborative vision. Our key insight is that by exploiting the StyleGAN prior, we can learn three-layered representations encoding hierarchical semantics, which are elaborately designed into the basic, middle, and enhanced layers, supporting machine intelligence and human visual perception in a progressive fashion. With the aim of achieving efficient compression, we propose the layer-wise scalable entropy transformer to reduce the redundancy between layers. Based on the multi-task scalable rate-distortion objective, the proposed scheme is jointly optimized to achieve optimal machine analysis performance, human perception experience, and compression ratio. We validate the proposed paradigm's feasibility in face image compression. Extensive qualitative and quantitative experimental results demonstrate the superiority of the proposed paradigm over the latest compression standard Versatile Video Coding (VVC) in terms of both machine analysis as well as human perception at extremely low bitrates ($<0.01$ bpp), offering new insights for human-machine collaborative compression.

Related papers

Guided Diffusion for the Extension of Machine Vision to Human Visual Perception [0.0]
We propose a method for extending machine vision to human visual perception using guided diffusion. Guided diffusion acts as a bridge between machine vision and human perception, enabling transitions between them without any additional overhead.
arXiv Detail & Related papers (2025-03-23T03:04:26Z)
Unified Coding for Both Human Perception and Generalized Machine Analytics with CLIP Supervision [44.5080084219247]
This paper introduces multimodal pre-training models and incorporates adaptive multi-objective optimization tailored to support both human visual perception and machine vision simultaneously with a single bitstream. The proposed Unified and Generalized Image Coding for Machine (UG-ICM) is capable of achieving remarkable improvements in various unseen machine analytics tasks.
arXiv Detail & Related papers (2025-01-08T15:48:30Z)
An Efficient Adaptive Compression Method for Human Perception and Machine Vision Tasks [27.318182211122558]
We introduce an efficient adaptive compression (EAC) method tailored for both human perception and multiple machine vision tasks. Our method enhances performance for multiple machine vision tasks while maintaining the quality of human vision.
arXiv Detail & Related papers (2025-01-08T08:03:49Z)
Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task [47.7670923159071]
This study introduces an innovative semantics DISentanglement and COmposition VERsatile (DISCOVER) to simultaneously enhance human-eye perception and machine vision tasks. The approach derives a set of labels per task through multimodal large models, which grounding models are then applied for precise localization, enabling a comprehensive understanding and disentanglement of image components at the encoder side. At the decoding stage, a comprehensive reconstruction of the image is achieved by leveraging these encoded components alongside priors from generative models, thereby optimizing performance for both human visual perception and machine-based analytical tasks.
arXiv Detail & Related papers (2024-12-24T04:32:36Z)
Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer [35.500720262253054]
This paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression. A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization. Experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception.
arXiv Detail & Related papers (2024-03-06T14:27:02Z)
Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient Neural Image Compression [11.25130799452367]
We propose an absolute image compression transformer (ICT) for neural image compression (NIC) ICT captures both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents. Our framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural SwinT-ChARM.
arXiv Detail & Related papers (2023-07-05T13:17:14Z)
Machine Perception-Driven Image Compression: A Layered Generative Approach [32.23554195427311]
layered generative image compression model is proposed to achieve high human vision-oriented image reconstructed quality. Task-agnostic learning-based compression model is proposed, which effectively supports various compressed domain-based analytical tasks. Joint optimization schedule is adopted to acquire best balance point among compression ratio, reconstructed image quality, and downstream perception performance.
arXiv Detail & Related papers (2023-04-14T02:12:38Z)
Video Coding for Machine: Compact Visual Representation Compression for Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression. This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z)
Revisit Visual Representation in Analytics Taxonomy: A Compression Perspective [69.99087941471882]
We study the problem of supporting multiple machine vision analytics tasks with the compressed visual representation. By utilizing the intrinsic transferability among different tasks, our framework successfully constructs compact and expressive representations at low bit-rates. In order to impose compactness in the representations, we propose a codebook-based hyperprior.
arXiv Detail & Related papers (2021-06-16T01:44:32Z)
Towards Analysis-friendly Face Representation with Scalable Feature and Texture Compression [113.30411004622508]
We show that a universal and collaborative visual information representation can be achieved in a hierarchical way. Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction. To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner.
arXiv Detail & Related papers (2020-04-21T14:32:49Z)
End-to-End Facial Deep Learning Feature Compression with Teacher-Student Enhancement [57.18801093608717]
We propose a novel end-to-end feature compression scheme by leveraging the representation and learning capability of deep neural networks. In particular, the extracted features are compactly coded in an end-to-end manner by optimizing the rate-distortion cost. We verify the effectiveness of the proposed model with the facial feature, and experimental results reveal better compression performance in terms of rate-accuracy.
arXiv Detail & Related papers (2020-02-10T10:08:44Z)
Towards Coding for Human and Machine Vision: A Scalable Image Coding Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models. By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels. Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.