One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression
- URL: http://arxiv.org/abs/2501.10064v1
- Date: Fri, 17 Jan 2025 09:29:33 GMT
- Title: One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression
- Authors: Keita Miwa, Kento Sasaki, Hidehisa Arai, Tsubasa Takahashi, Yu Yamaguchi,
- Abstract summary: We introduce One-D-Piece, a discrete image tokenizer for variable-length tokenization.<n>Tail Token Drop is a regularization mechanism named "Tail Token Drop" into discrete one-dimensional image tokenizers.<n>We evaluate our tokenizer across multiple reconstruction quality metrics and find that it delivers significantly better perceptual quality than existing quality-controllable compression methods.
- Score: 1.7942265700058988
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Current image tokenization methods require a large number of tokens to capture the information contained within images. Although the amount of information varies across images, most image tokenizers only support fixed-length tokenization, leading to inefficiency in token allocation. In this study, we introduce One-D-Piece, a discrete image tokenizer designed for variable-length tokenization, achieving quality-controllable mechanism. To enable variable compression rate, we introduce a simple but effective regularization mechanism named "Tail Token Drop" into discrete one-dimensional image tokenizers. This method encourages critical information to concentrate at the head of the token sequence, enabling support of variadic tokenization, while preserving state-of-the-art reconstruction quality. We evaluate our tokenizer across multiple reconstruction quality metrics and find that it delivers significantly better perceptual quality than existing quality-controllable compression methods, including JPEG and WebP, at smaller byte sizes. Furthermore, we assess our tokenizer on various downstream computer vision tasks, including image classification, object detection, semantic segmentation, and depth estimation, confirming its adaptability to numerous applications compared to other variable-rate methods. Our approach demonstrates the versatility of variable-length discrete image tokenization, establishing a new paradigm in both compression efficiency and reconstruction performance. Finally, we validate the effectiveness of tail token drop via detailed analysis of tokenizers.
Related papers
- "Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.
Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z) - CAT: Content-Adaptive Image Tokenization [92.2116487267877]
We introduce Content-Adaptive Tokenizer (CAT), which adjusts representation capacity based on the image content and encodes simpler images into fewer tokens.<n>We design a caption-based evaluation system that leverages large language models (LLMs) to predict content complexity and determine the optimal compression ratio for a given image.<n>By optimizing token allocation, CAT improves the FID score over fixed-ratio baselines trained with the same flops and boosts the inference throughput by 18.5%.
arXiv Detail & Related papers (2025-01-06T16:28:47Z) - ImageFolder: Autoregressive Image Generation with Folded Tokens [51.815319504939396]
Increasing token length is a common approach to improve the image reconstruction quality.<n>There exists a trade-off between reconstruction and generation quality regarding token length.<n>We propose Image, a semantic tokenizer that provides spatially aligned image tokens that can be folded during autoregressive modeling.
arXiv Detail & Related papers (2024-10-02T17:06:39Z) - DeepHQ: Learned Hierarchical Quantizer for Progressive Deep Image Coding [27.875207681547074]
progressive image coding (PIC) aims to compress various qualities of images into a single bitstream.
Research on neural network (NN)-based PIC is in its early stages.
We propose an NN-based progressive coding method that firstly utilizes learned quantization step sizes via learning for each quantization layer.
arXiv Detail & Related papers (2024-08-22T06:32:53Z) - Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding [54.532578213126065]
Most document understanding methods preserve all tokens within sub-images and treat them equally.
This neglects their different informativeness and leads to a significant increase in the number of image tokens.
We propose Token-level Correlation-guided Compression, a parameter-free and plug-and-play methodology to optimize token processing.
arXiv Detail & Related papers (2024-07-19T16:11:15Z) - Rate-Distortion-Cognition Controllable Versatile Neural Image Compression [47.72668401825835]
We propose a rate-distortion-cognition controllable versatile image compression method.
Our method yields satisfactory ICM performance and flexible Rate-DistortionCognition controlling.
arXiv Detail & Related papers (2024-07-16T13:17:51Z) - Probing Image Compression For Class-Incremental Learning [8.711266563753846]
Continual machine learning (ML) systems rely on storing representative samples, also known as exemplars, within a limited memory constraint to maintain the performance on previously learned data.
In this paper, we explore the use of image compression as a strategy to enhance the buffer's capacity, thereby increasing exemplar diversity.
We introduce a new framework to incorporate image compression for continual ML including a pre-processing data compression step and an efficient compression rate/algorithm selection method.
arXiv Detail & Related papers (2024-03-10T18:58:14Z) - Subobject-level Image Tokenization [60.80949852899857]
Patch-based image tokenization ignores the morphology of the visual world.
Inspired by subword tokenization, we introduce subobject-level adaptive token segmentation.
We show that subobject tokenization enables faster convergence and better generalization while using fewer visual tokens.
arXiv Detail & Related papers (2024-02-22T06:47:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.