Related papers: SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage

SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage

URL: http://arxiv.org/abs/2303.11114v2
Date: Mon, 11 Sep 2023 06:04:24 GMT
Title: SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage
Authors: Song Park and Sanghyuk Chun and Byeongho Heo and Wonjae Kim and Sangdoo Yun
Abstract summary: We propose a storage-efficient training strategy for vision classifiers for large-scale datasets. Our token storage only needs 1% of the original JPEG-compressed raw pixels. Our experimental results on ImageNet-1k show that our method significantly outperforms other storage-efficient training methods with a large gap.
Score: 52.317406324182215
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We need billion-scale images to achieve more generalizable and ground-breaking vision models, as well as massive dataset storage to ship the images (e.g., the LAION-4B dataset needs 240TB storage space). However, it has become challenging to deal with unlimited dataset storage with limited storage infrastructure. A number of storage-efficient training methods have been proposed to tackle the problem, but they are rarely scalable or suffer from severe damage to performance. In this paper, we propose a storage-efficient training strategy for vision classifiers for large-scale datasets (e.g., ImageNet) that only uses 1024 tokens per instance without using the raw level pixels; our token storage only needs <1% of the original JPEG-compressed raw pixels. We also propose token augmentations and a Stem-adaptor module to make our approach able to use the same architecture as pixel-based approaches with only minimal modifications on the stem layer and the carefully tuned optimization settings. Our experimental results on ImageNet-1k show that our method significantly outperforms other storage-efficient training methods with a large gap. We further show the effectiveness of our method in other practical scenarios, storage-efficient pre-training, and continual learning. Code is available at https://github.com/naver-ai/seit

Related papers

Scaling Training Data with Lossy Image Compression [8.05574597775852]
In computer vision, images are inherently analog, but are always stored in a digital format using a finite number of bits. We propose a storage scaling law' that describes the joint evolution of test error with sample size and number of bits per image. We prove that this law holds within a stylized model for image compression, and verify it empirically on two computer vision tasks.
arXiv Detail & Related papers (2024-07-25T11:19:55Z)
Associative Memories in the Feature Space [68.1903319310263]
We propose a class of memory models that only stores low-dimensional semantic embeddings, and uses them to retrieve similar, but not identical, memories. We demonstrate a proof of concept of this method on a simple task on the MNIST dataset.
arXiv Detail & Related papers (2024-02-16T16:37:48Z)
Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models [37.574691902971296]
We propose a novel image clustering pipeline that leverages the powerful feature representation of large pre-trained models. We show that our pipeline works well on standard datasets such as CIFAR-10, CIFAR-100, and ImageNet-1k.
arXiv Detail & Related papers (2023-06-08T15:20:27Z)
Raw Image Reconstruction with Learned Compact Metadata [61.62454853089346]
We propose a novel framework to learn a compact representation in the latent space serving as the metadata in an end-to-end manner. We show how the proposed raw image compression scheme can adaptively allocate more bits to image regions that are important from a global perspective.
arXiv Detail & Related papers (2023-02-25T05:29:45Z)
{\mu}Split: efficient image decomposition for microscopy data [50.794670705085835]
muSplit is a dedicated approach for trained image decomposition in the context of fluorescence microscopy images. We introduce lateral contextualization (LC), a novel meta-architecture that enables the memory efficient incorporation of large image-context. We apply muSplit to five decomposition tasks, one on a synthetic dataset, four others derived from real microscopy data.
arXiv Detail & Related papers (2022-11-23T11:26:24Z)
Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory [66.035487142452]
We show that trajectory-matching-based methods (MTT) can scale to large-scale datasets such as ImageNet-1K. We propose a procedure to exactly compute the unrolled gradient with constant memory complexity, which allows us to scale MTT to ImageNet-1K seamlessly with 6x reduction in memory footprint. The resulting algorithm sets new SOTA on ImageNet-1K: we can scale up to 50 IPCs (Image Per Class) on ImageNet-1K on a single GPU.
arXiv Detail & Related papers (2022-11-19T04:46:03Z)
Memory Efficient Meta-Learning with Large Images [62.70515410249566]
Meta learning approaches to few-shot classification are computationally efficient at test time requiring just a few optimization steps or single forward pass to learn a new task. This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken. We propose LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU.
arXiv Detail & Related papers (2021-07-02T14:37:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.