Dual Codebook VQ: Enhanced Image Reconstruction with Reduced Codebook Size
- URL: http://arxiv.org/abs/2503.10832v1
- Date: Thu, 13 Mar 2025 19:31:18 GMT
- Title: Dual Codebook VQ: Enhanced Image Reconstruction with Reduced Codebook Size
- Authors: Parisa Boodaghi Malidarreh, Jillur Rahman Saurav, Thuong Le Hoai Pham, Amir Hajighasemi, Anahita Samadi, Saurabh Shrinivas Maydeo, Mohammad Sadegh Nasr, Jacob M. Luber,
- Abstract summary: Vector Quantization (VQ) techniques face challenges in codebook utilization, limiting reconstruction fidelity in image modeling.<n>We introduce a Dual Codebook mechanism that effectively addresses this limitation by partitioning the representation into complementary global and local components.<n>Our approach achieves significant FID improvements across diverse image domains, particularly excelling in scene and face reconstruction tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Vector Quantization (VQ) techniques face significant challenges in codebook utilization, limiting reconstruction fidelity in image modeling. We introduce a Dual Codebook mechanism that effectively addresses this limitation by partitioning the representation into complementary global and local components. The global codebook employs a lightweight transformer for concurrent updates of all code vectors, while the local codebook maintains precise feature representation through deterministic selection. This complementary approach is trained from scratch without requiring pre-trained knowledge. Experimental evaluation across multiple standard benchmark datasets demonstrates state-of-the-art reconstruction quality while using a compact codebook of size 512 - half the size of previous methods that require pre-training. Our approach achieves significant FID improvements across diverse image domains, particularly excelling in scene and face reconstruction tasks. These results establish Dual Codebook VQ as an efficient paradigm for high-fidelity image reconstruction with significantly reduced computational requirements.
Related papers
- IQPFR: An Image Quality Prior for Blind Face Restoration and Beyond [56.99331967165238]
Blind Face Restoration (BFR) addresses the challenge of reconstructing degraded low-quality (LQ) facial images into high-quality (HQ) outputs.<n>We propose a novel framework that incorporates an Image Quality Prior (IQP) derived from No-Reference Image Quality Assessment (NR-IQA) models.<n>Our method outperforms state-of-the-art techniques across multiple benchmarks.
arXiv Detail & Related papers (2025-03-12T11:39:51Z) - SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation [73.98487014058286]
SemHiTok is a unified image tokenizer via Semantic-Guided Hierarchical codebook.
We show that SemHiTok achieves excellent rFID score at 256X256resolution compared to other unified tokenizers.
arXiv Detail & Related papers (2025-03-09T20:42:34Z) - Scalable Image Tokenization with Index Backpropagation Quantization [74.15447383432262]
Index Backpropagation Quantization (IBQ) is a new VQ method for the joint optimization of all codebook embeddings and the visual encoder.<n>IBQ enables scalable training of visual tokenizers and, for the first time, achieves a large-scale codebook with high dimension ($256$) and high utilization.
arXiv Detail & Related papers (2024-12-03T18:59:10Z) - Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% [35.710953589794855]
We propose a novel image quantization model named VQGAN-LC (Large Codebook), which extends the codebook size to 100,000, achieving an utilization rate exceeding 99%.
We demonstrate the superior performance of our model over its counterparts across a variety of tasks, including image reconstruction, image classification, auto-regressive image generation using GPT, and image creation with diffusion- and flow-based generative models.
arXiv Detail & Related papers (2024-06-17T17:59:57Z) - HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression [51.04820313355164]
HyrbidFlow combines the continuous-feature-based and codebook-based streams to achieve both high perceptual quality and high fidelity under extreme lows.
Experimental results demonstrate superior performance across several datasets under extremely lows.
arXiv Detail & Related papers (2024-04-20T13:19:08Z) - Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling [15.132926378740882]
We propose a novel codebook transfer framework with part-of-speech, called VQCT, which aims to transfer a well-trained codebook from pretrained language models to VQIM.
Experimental results on four datasets show that our VQCT method achieves superior VQIM performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2024-03-15T07:24:13Z) - MOC-RVQ: Multilevel Codebook-Assisted Digital Generative Semantic Communication [43.17888320268593]
We propose a multilevel generative semantic communication system with a two-stage training framework.
In the first stage, we train a high-quality codebook, using a multi-head octonary codebook to compress the index range.
In the second stage, a noise reduction block (NRB) based on Swin Transformer is introduced, serving as a high-quality semantic knowledge base.
arXiv Detail & Related papers (2024-01-02T16:17:43Z) - Not All Image Regions Matter: Masked Vector Quantization for
Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.
We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z) - Towards Accurate Image Coding: Improved Autoregressive Image Generation
with Dynamic Vector Quantization [73.52943587514386]
Existing vector quantization (VQ) based autoregressive models follow a two-stage generation paradigm.
We propose a novel two-stage framework: (1) Dynamic-Quantization VAE (DQ-VAE) which encodes image regions into variable-length codes based their information densities for accurate representation.
arXiv Detail & Related papers (2023-05-19T14:56:05Z) - Hierarchical Residual Learning Based Vector Quantized Variational
Autoencoder for Image Reconstruction and Generation [19.92324010429006]
We propose a multi-layer variational autoencoder method, we call HR-VQVAE, that learns hierarchical discrete representations of the data.
We evaluate our method on the tasks of image reconstruction and generation.
arXiv Detail & Related papers (2022-08-09T06:04:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.