SLIC: A Learned Image Codec Using Structure and Color
- URL: http://arxiv.org/abs/2401.17246v1
- Date: Tue, 30 Jan 2024 18:39:54 GMT
- Title: SLIC: A Learned Image Codec Using Structure and Color
- Authors: Srivatsa Prativadibhayankaram, Mahadev Prasad Panda, Thomas Richter,
Heiko Sparenberg, Siegfried F\"o{\ss}el, Andr\'e Kaup
- Abstract summary: We propose a structure and color based encoder (SLIC) in which the task of compression is split into that of luminance and chrominance.
The deep learning model is built with a novel multi-scale architecture for Y and UV channels.
Various experiments are carried out to study and analyze the performance of the proposed model.
- Score: 0.41232474244672235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose the structure and color based learned image codec (SLIC) in which
the task of compression is split into that of luminance and chrominance. The
deep learning model is built with a novel multi-scale architecture for Y and UV
channels in the encoder, where the features from various stages are combined to
obtain the latent representation. An autoregressive context model is employed
for backward adaptation and a hyperprior block for forward adaptation. Various
experiments are carried out to study and analyze the performance of the
proposed model, and to compare it with other image codecs. We also illustrate
the advantages of our method through the visualization of channel impulse
responses, latent channels and various ablation studies. The model achieves
Bj{\o}ntegaard delta bitrate gains of 7.5% and 4.66% in terms of MS-SSIM and
CIEDE2000 metrics with respect to other state-of-the-art reference codecs.
Related papers
- A Study on the Effect of Color Spaces in Learned Image Compression [14.39599746127334]
We present a comparison between color spaces namely YUV, LAB, RGB and their effect on learned image compression.
We use the structure and color based learned image% (SLIC) from our prior work, which consists of two branches - one for the luminance component (Y or L) and another for chrominance components (UV or AB)
arXiv Detail & Related papers (2024-06-19T17:05:28Z) - ColorVideoVDP: A visual difference predictor for image, video and display distortions [51.29162719944865]
metric is built on novel psychophysical models of chromatic contrast sensitivity and cross-channel contrast masking.
It accounts for the viewing conditions, geometric, and photometric characteristics of the display.
It was trained to predict common video streaming distortions and 8 new distortion types related to AR/VR displays.
arXiv Detail & Related papers (2024-01-21T13:16:33Z) - Learning Vision from Models Rivals Learning Vision from Data [54.43596959598465]
We introduce SynCLR, a novel approach for learning visual representations exclusively from synthetic images and synthetic captions.
We synthesize a large dataset of image captions using LLMs, then use an off-the-shelf text-to-image model to generate multiple images corresponding to each synthetic caption.
We perform visual representation learning on these synthetic images via contrastive learning, treating images sharing the same caption as positive pairs.
arXiv Detail & Related papers (2023-12-28T18:59:55Z) - Color Learning for Image Compression [1.2330326247154968]
We propose a novel deep learning model architecture, where the task of image compression is divided into two sub-tasks.
The model has two separate branches to process the luminance and chrominance components.
We demonstrate the benefits of our approach and compare the performance to other codecs.
arXiv Detail & Related papers (2023-06-30T08:16:48Z) - DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis.
Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - Learned Multi-Resolution Variable-Rate Image Compression with
Octave-based Residual Blocks [15.308823742699039]
We propose a new variable-rate image compression framework, which employs generalized octave convolutions (GoConv) and generalized octave transposed-convolutions (GoTConv)
To enable a single model to operate with different bit rates and to learn multi-rate image features, a new objective function is introduced.
Experimental results show that the proposed framework trained with variable-rate objective function outperforms the standard codecs such as H.265/HEVC-based BPG and state-of-the-art learning-based variable-rate methods.
arXiv Detail & Related papers (2020-12-31T06:26:56Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.