Related papers: GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting

GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting

URL: http://arxiv.org/abs/2501.15619v1
Date: Sun, 26 Jan 2025 17:56:11 GMT
Title: GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting
Authors: Jiajun Dong, Chengkun Wang, Wenzhao Zheng, Lei Chen, Jiwen Lu, Yansong Tang,
Abstract summary: We propose an effective image tokenizer with 2D Gaussian Splatting as a solution.<n>In general, our framework integrates the local influence of 2D Gaussian distribution into the discrete space.<n> Competitive reconstruction performances on CIFAR, Mini-Net, and ImageNet-1K demonstrate the effectiveness of our framework.
Score: 64.84383010238908
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Effective image tokenization is crucial for both multi-modal understanding and generation tasks due to the necessity of the alignment with discrete text data. To this end, existing approaches utilize vector quantization (VQ) to project pixels onto a discrete codebook and reconstruct images from the discrete representation. However, compared with the continuous latent space, the limited discrete codebook space significantly restrict the representational ability of these image tokenizers. In this paper, we propose GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting as a solution. We first represent the encoded samples as multiple flexible featured 2D Gaussians characterized by positions, rotation angles, scaling factors, and feature coefficients. We adopt the standard quantization for the Gaussian features and then concatenate the quantization results with the other intrinsic Gaussian parameters before the corresponding splatting operation and the subsequent decoding module. In general, GaussianToken integrates the local influence of 2D Gaussian distribution into the discrete space and thus enhances the representation capability of the image tokenizer. Competitive reconstruction performances on CIFAR, Mini-ImageNet, and ImageNet-1K demonstrate the effectiveness of our framework. Our code is available at: https://github.com/ChrisDong-THU/GaussianToken.

Related papers

Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images [12.274418254425019]
3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis performance. We propose Gaussian Graph Network (GGN) to generate efficient and generalizable Gaussian representations. We conduct experiments on the large-scale RealEstate10K and ACID datasets to demonstrate the efficiency and generalization of our method.
arXiv Detail & Related papers (2025-03-20T16:56:13Z)
Large Images are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting [21.629316414488027]
We present textbfLarge textbfImages are textbfGaussians (textbfLIG), which delves deeper into the application of 2DGS for image representations.
arXiv Detail & Related papers (2025-02-13T07:48:56Z)
SmileSplat: Generalizable Gaussian Splats for Unconstrained Sparse Images [91.28365943547703]
A novel generalizable Gaussian Splatting method, SmileSplat, is proposed to reconstruct pixel-aligned Gaussian surfels for diverse scenarios.<n>The proposed method achieves state-of-the-art performance in various 3D vision tasks.
arXiv Detail & Related papers (2024-11-27T05:52:28Z)
Image Understanding Makes for A Good Tokenizer for Image Generation [62.875788091204626]
We introduce a token-based IG framework, which relies on effective tokenizers to project images into token sequences. We show that tokenizers with strong IU capabilities achieve superior IG performance across a variety of metrics, datasets, tasks, and proposal networks.
arXiv Detail & Related papers (2024-11-07T03:55:23Z)
PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views [116.10577967146762]
PixelGaussian is an efficient framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Our method achieves state-of-the-art performance with good generalization to various numbers of views.
arXiv Detail & Related papers (2024-10-24T17:59:58Z)
GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution [29.49617080140511]
Implicit neural representations (INRs) have significantly advanced the field of arbitrary-scale super-resolution (ASSR) of images. Most existing INR-based ASSR networks first extract features from the given low-resolution image using an encoder, and then render the super-resolved result via a multi-layer perceptron decoder. We propose a novel ASSR method named GaussianSR that overcomes this limitation through 2D Gaussian Splatting (2DGS)
arXiv Detail & Related papers (2024-07-25T13:53:48Z)
Image-GS: Content-Adaptive Image Representation via 2D Gaussians [55.15950594752051]
We propose Image-GS, a content-adaptive image representation. Using anisotropic 2D Gaussians as the basis, Image-GS shows high memory efficiency, supports fast random access, and offers a natural level of detail stack. General efficiency and fidelity of Image-GS are validated against several recent neural image representations and industry-standard texture compressors. We hope this research offers insights for developing new applications that require adaptive quality and resource control, such as machine perception, asset streaming, and content generation.
arXiv Detail & Related papers (2024-07-02T00:45:21Z)
Learning Segmented 3D Gaussians via Efficient Feature Unprojection for Zero-shot Neural Scene Segmentation [16.57158278095853]
Zero-shot neural scene segmentation serves as an effective way for scene understanding. Existing models, especially the efficient 3D Gaussian-based methods, struggle to produce compact segmentation results. Our work proposes the Feature Unprojection and Fusion module as the segmentation field. We show that our model surpasses baselines on zero-shot semantic segmentation task, improving by 10% mIoU over the best baseline.
arXiv Detail & Related papers (2024-01-11T14:05:01Z)
Compact 3D Gaussian Representation for Radiance Field [14.729871192785696]
We propose a learnable mask strategy to reduce the number of 3D Gaussian points without sacrificing performance. We also propose a compact but effective representation of view-dependent color by employing a grid-based neural field. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering.
arXiv Detail & Related papers (2023-11-22T20:31:16Z)
NP-DRAW: A Non-Parametric Structured Latent Variable Modelfor Image Generation [139.8037697822064]
We present a non-parametric structured latent variable model for image generation, called NP-DRAW. It sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas.
arXiv Detail & Related papers (2021-06-25T05:17:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.