High-Efficiency Lossy Image Coding Through Adaptive Neighborhood
Information Aggregation
- URL: http://arxiv.org/abs/2204.11448v1
- Date: Mon, 25 Apr 2022 05:40:57 GMT
- Title: High-Efficiency Lossy Image Coding Through Adaptive Neighborhood
Information Aggregation
- Authors: Ming Lu and Zhan Ma
- Abstract summary: Lossy image coding (LIC) with superior efficiency on both compression performance and throughput is challenging.
Our method reports the superior compression performance surpassing the VVC Intra with $approx$15% BD-rate improvement averaged across Kodak, CLIC and Tecnick datasets.
- Score: 37.02522504535854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Questing for lossy image coding (LIC) with superior efficiency on both
compression performance and computation throughput is challenging. The vital
factor behind is how to intelligently explore Adaptive Neighborhood Information
Aggregation (ANIA) in transform and entropy coding modules. To this aim,
Integrated Convolution and Self-Attention (ICSA) unit is first proposed to form
content-adaptive transform to dynamically characterize and embed neighborhood
information conditioned on the input. Then a Multistage Context Model (MCM) is
developed to stagewisely execute context prediction using necessary
neighborhood elements for accurate and parallel entropy probability estimation.
Both ICSA and MCM are stacked under a Variational Auto-Encoder (VAE)
architecture to derive rate-distortion optimized compact representation of
input image via end-to-end training. Our method reports the superior
compression performance surpassing the VVC Intra with $\approx$15% BD-rate
improvement averaged across Kodak, CLIC and Tecnick datasets; and also
demonstrates $\approx$10$\times$ speedup of image decoding when compared with
other notable learned LIC approaches. All materials are made publicly
accessible at https://njuvision.github.io/TinyLIC for reproducible research.
Related papers
- InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression [114.03378443007074]
Current tokenizers rigidly compress all content at a fixed rate, leading to redundancy or information loss.<n>This paper introduces InfoTok, a principled framework for adaptive video tokenization.<n>We develop a transformer-based adaptive compressor that enables adaptive tokenization.
arXiv Detail & Related papers (2025-12-18T17:13:59Z) - Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation [75.58269386927076]
Autoregressive (AR) models are often dismissed as impractical due to prohibitive computational cost.<n>This work re-thinks this paradigm, introducing a framework built on hierarchical parallelism and progressive adaptation.<n> Experiments on diverse datasets (natural, satellite, medical) validate that our method achieves new state-of-the-art compression.
arXiv Detail & Related papers (2025-11-14T06:27:58Z) - DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs [124.52164183968145]
We present DyMU, an efficient, training-free framework that reduces the computational burden of vision-language models (VLMs)
Our approach comprises two key components. First, Dynamic Token Merging (DToMe) reduces the number of visual token embeddings by merging similar tokens based on image complexity.
Second, Virtual Token Unmerging (VTU) simulates the expected token sequence for large language models (LLMs) by efficiently reconstructing the attention dynamics of a full sequence.
arXiv Detail & Related papers (2025-04-23T18:38:18Z) - ContextFormer: Redefining Efficiency in Semantic Segmentation [48.81126061219231]
Convolutional methods, although capturing local dependencies well, struggle with long-range relationships.
Vision Transformers (ViTs) excel in global context capture but are hindered by high computational demands.
We propose ContextFormer, a hybrid framework leveraging the strengths of CNNs and ViTs in the bottleneck to balance efficiency, accuracy, and robustness for real-time semantic segmentation.
arXiv Detail & Related papers (2025-01-31T16:11:04Z) - CALLIC: Content Adaptive Learning for Lossless Image Compression [64.47244912937204]
CALLIC sets a new state-of-the-art (SOTA) for learned lossless image compression.
We propose a content-aware autoregressive self-attention mechanism by leveraging convolutional gating operations.
During encoding, we decompose pre-trained layers, including depth-wise convolutions, using low-rank matrices and then adapt the incremental weights on testing image by Rate-guided Progressive Fine-Tuning (RPFT)
RPFT fine-tunes with gradually increasing patches that are sorted in descending order by estimated entropy, optimizing learning process and reducing adaptation time.
arXiv Detail & Related papers (2024-12-23T10:41:18Z) - Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations.
We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions.
In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z) - Progressive Learning with Visual Prompt Tuning for Variable-Rate Image
Compression [60.689646881479064]
We propose a progressive learning paradigm for transformer-based variable-rate image compression.
Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively.
Our model outperforms all current variable image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed image compression methods trained from scratch.
arXiv Detail & Related papers (2023-11-23T08:29:32Z) - AICT: An Adaptive Image Compression Transformer [18.05997169440533]
We propose a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT)
The proposed ICT can capture both global and local contexts from the latent representations.
We leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation.
arXiv Detail & Related papers (2023-07-12T11:32:02Z) - Efficient Contextformer: Spatio-Channel Window Attention for Fast
Context Modeling in Learned Image Compression [1.9249287163937978]
We introduce the Efficient Contextformer (eContextformer) - a transformer-based autoregressive context model for learned image.
It fuses patch-wise, checkered, and channel-wise grouping techniques for parallel context modeling.
It achieves 145x lower model complexity and 210Cx faster decoding speed, and higher average bit savings on Kodak, CLI, and Tecnick datasets.
arXiv Detail & Related papers (2023-06-25T16:29:51Z) - AdaptiveClick: Clicks-aware Transformer with Adaptive Focal Loss for Interactive Image Segmentation [51.82915587228898]
We introduce AdaptiveClick -- a transformer-based, mask-adaptive segmentation framework for Interactive Image (IIS)
The key ingredient of our method is the Click-Aware Mask-adaptive transformer Decoder (CAMD), which enhances the interaction between click and image features.
With a plain ViT backbone, extensive experimental results on nine datasets demonstrate the superiority of AdaptiveClick compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-07T13:47:35Z) - Learned Image Compression with Mixed Transformer-CNN Architectures [21.53261818914534]
We propose an efficient parallel Transformer-CNN Mixture (TCM) block with a controllable complexity.
Inspired by the recent progress of entropy estimation models and attention modules, we propose a channel-wise entropy model with parameter-efficient swin-transformer-based attention.
Experimental results demonstrate our proposed method achieves state-of-the-art rate-distortion performances.
arXiv Detail & Related papers (2023-03-27T08:19:01Z) - FastMIM: Expediting Masked Image Modeling Pre-training for Vision [65.47756720190155]
FastMIM is a framework for pre-training vision backbones with low-resolution input images.
It reconstructs Histograms of Oriented Gradients (HOG) feature instead of original RGB values of the input images.
It can achieve 83.8%/84.1% top-1 accuracy on ImageNet-1K with ViT-B/Swin-B as backbones.
arXiv Detail & Related papers (2022-12-13T14:09:32Z) - VLDeformer: Learning Visual-Semantic Embeddings by Vision-Language
Transformer Decomposing [7.890230091463883]
Vision-language transformers (VL transformers) have shown impressive accuracy in cross-modal retrieval.
We propose a novel Vision-language Transformer Decomposing (VLDeformer) to modify the VL transformer as an individual encoder for a single image or text.
arXiv Detail & Related papers (2021-10-20T09:00:51Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.