One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression
- URL: http://arxiv.org/abs/2501.10064v1
- Date: Fri, 17 Jan 2025 09:29:33 GMT
- Title: One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression
- Authors: Keita Miwa, Kento Sasaki, Hidehisa Arai, Tsubasa Takahashi, Yu Yamaguchi,
- Abstract summary: We introduce One-D-Piece, a discrete image tokenizer for variable-length tokenization.
Tail Token Drop is a regularization mechanism named "Tail Token Drop" into discrete one-dimensional image tokenizers.
We evaluate our tokenizer across multiple reconstruction quality metrics and find that it delivers significantly better perceptual quality than existing quality-controllable compression methods.
- Score: 1.7942265700058988
- License:
- Abstract: Current image tokenization methods require a large number of tokens to capture the information contained within images. Although the amount of information varies across images, most image tokenizers only support fixed-length tokenization, leading to inefficiency in token allocation. In this study, we introduce One-D-Piece, a discrete image tokenizer designed for variable-length tokenization, achieving quality-controllable mechanism. To enable variable compression rate, we introduce a simple but effective regularization mechanism named "Tail Token Drop" into discrete one-dimensional image tokenizers. This method encourages critical information to concentrate at the head of the token sequence, enabling support of variadic tokenization, while preserving state-of-the-art reconstruction quality. We evaluate our tokenizer across multiple reconstruction quality metrics and find that it delivers significantly better perceptual quality than existing quality-controllable compression methods, including JPEG and WebP, at smaller byte sizes. Furthermore, we assess our tokenizer on various downstream computer vision tasks, including image classification, object detection, semantic segmentation, and depth estimation, confirming its adaptability to numerous applications compared to other variable-rate methods. Our approach demonstrates the versatility of variable-length discrete image tokenization, establishing a new paradigm in both compression efficiency and reconstruction performance. Finally, we validate the effectiveness of tail token drop via detailed analysis of tokenizers.
Related papers
- DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models [12.875837358532422]
We analyze the challenges associated with quantizing text-to-image diffusion models from a distributional perspective.
We propose Distribution-aware Group Quantization (DGQ), a method that adaptively handles pixel-wise and channel-wise outliers to preserve image quality.
Our method demonstrates remarkable performance on datasets such as MS-COCO and PartiPrompts.
arXiv Detail & Related papers (2025-01-08T06:30:31Z) - CAT: Content-Adaptive Image Tokenization [92.2116487267877]
We introduce Content-Adaptive Tokenizer (CAT), which adjusts representation capacity based on the image content and encodes simpler images into fewer tokens.
We design a caption-based evaluation system that leverages large language models (LLMs) to predict content complexity and determine the optimal compression ratio for a given image.
By optimizing token allocation, CAT improves the FID score over fixed-ratio baselines trained with the same flops and boosts the inference throughput by 18.5%.
arXiv Detail & Related papers (2025-01-06T16:28:47Z) - Adaptive Length Image Tokenization via Recurrent Allocation [81.10081670396956]
Current vision systems assign fixed-length representations to images, regardless of the information content.
Inspired by this, we propose an approach to learn variable-length token representations for 2D images.
arXiv Detail & Related papers (2024-11-04T18:58:01Z) - ImageFolder: Autoregressive Image Generation with Folded Tokens [51.815319504939396]
Increasing token length is a common approach to improve the image reconstruction quality.
There exists a trade-off between reconstruction and generation quality regarding token length.
We propose Image, a semantic tokenizer that provides spatially aligned image tokens that can be folded during autoregressive modeling.
arXiv Detail & Related papers (2024-10-02T17:06:39Z) - DeepHQ: Learned Hierarchical Quantizer for Progressive Deep Image Coding [27.875207681547074]
progressive image coding (PIC) aims to compress various qualities of images into a single bitstream.
Research on neural network (NN)-based PIC is in its early stages.
We propose an NN-based progressive coding method that firstly utilizes learned quantization step sizes via learning for each quantization layer.
arXiv Detail & Related papers (2024-08-22T06:32:53Z) - Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding [54.532578213126065]
Most document understanding methods preserve all tokens within sub-images and treat them equally.
This neglects their different informativeness and leads to a significant increase in the number of image tokens.
We propose Token-level Correlation-guided Compression, a parameter-free and plug-and-play methodology to optimize token processing.
arXiv Detail & Related papers (2024-07-19T16:11:15Z) - Rate-Distortion-Cognition Controllable Versatile Neural Image Compression [47.72668401825835]
We propose a rate-distortion-cognition controllable versatile image compression method.
Our method yields satisfactory ICM performance and flexible Rate-DistortionCognition controlling.
arXiv Detail & Related papers (2024-07-16T13:17:51Z) - Probing Image Compression For Class-Incremental Learning [8.711266563753846]
Continual machine learning (ML) systems rely on storing representative samples, also known as exemplars, within a limited memory constraint to maintain the performance on previously learned data.
In this paper, we explore the use of image compression as a strategy to enhance the buffer's capacity, thereby increasing exemplar diversity.
We introduce a new framework to incorporate image compression for continual ML including a pre-processing data compression step and an efficient compression rate/algorithm selection method.
arXiv Detail & Related papers (2024-03-10T18:58:14Z) - Discernible Image Compression [124.08063151879173]
This paper aims to produce compressed images by pursuing both appearance and perceptual consistency.
Based on the encoder-decoder framework, we propose using a pre-trained CNN to extract features of the original and compressed images.
Experiments on benchmarks demonstrate that images compressed by using the proposed method can also be well recognized by subsequent visual recognition and detection models.
arXiv Detail & Related papers (2020-02-17T07:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.