Replacing Labeled Real-image Datasets with Auto-generated Contours
- URL: http://arxiv.org/abs/2206.09132v1
- Date: Sat, 18 Jun 2022 06:43:38 GMT
- Title: Replacing Labeled Real-image Datasets with Auto-generated Contours
- Authors: Hirokatsu Kataoka, Ryo Hayamizu, Ryosuke Yamada, Kodai Nakashima, Sora
Takashima, Xinyu Zhang, Edgar Josafat Martinez-Noriega, Nakamasa Inoue, Rio
Yokota
- Abstract summary: We show that formula-driven supervised learning can match or even exceed that of ImageNet-21k without the use of real images.
Images generated by formulas avoid the privacy/copyright issues, labeling cost and errors, and biases that real images suffer from.
- Score: 20.234550996148748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the present work, we show that the performance of formula-driven
supervised learning (FDSL) can match or even exceed that of ImageNet-21k
without the use of real images, human-, and self-supervision during the
pre-training of Vision Transformers (ViTs). For example, ViT-Base pre-trained
on ImageNet-21k shows 81.8% top-1 accuracy when fine-tuned on ImageNet-1k and
FDSL shows 82.7% top-1 accuracy when pre-trained under the same conditions
(number of images, hyperparameters, and number of epochs). Images generated by
formulas avoid the privacy/copyright issues, labeling cost and errors, and
biases that real images suffer from, and thus have tremendous potential for
pre-training general models. To understand the performance of the synthetic
images, we tested two hypotheses, namely (i) object contours are what matter in
FDSL datasets and (ii) increased number of parameters to create labels affects
performance improvement in FDSL pre-training. To test the former hypothesis, we
constructed a dataset that consisted of simple object contour combinations. We
found that this dataset can match the performance of fractals. For the latter
hypothesis, we found that increasing the difficulty of the pre-training task
generally leads to better fine-tuning accuracy.
Related papers
- Scaling Backwards: Minimal Synthetic Pre-training? [52.78699562832907]
We show that pre-training is effective even with minimal synthetic images.
We find that a substantial reduction of synthetic images from 1k to 1 can even lead to an increase in pre-training performance.
We extend our method from synthetic images to real images to see if a single real image can show similar pre-training effect.
arXiv Detail & Related papers (2024-08-01T16:20:02Z) - ALIP: Adaptive Language-Image Pre-training with Synthetic Caption [78.93535202851278]
Contrastive Language-Image Pre-training (CLIP) has significantly boosted the performance of various vision-language tasks.
The presence of intrinsic noise and unmatched image-text pairs in web data can potentially affect the performance of representation learning.
We propose an Adaptive Language-Image Pre-training (ALIP), a bi-path model that integrates supervision from both raw text and synthetic caption.
arXiv Detail & Related papers (2023-08-16T15:19:52Z) - Pre-training Vision Transformers with Very Limited Synthesized Images [18.627567043226172]
Formula-driven supervised learning (F) is a pre-training method that relies on synthetic images generated from mathematical formulae such as fractals.
Prior work on F has shown that pre-training vision transformers on such synthetic datasets can yield competitive accuracy on a wide range of downstream tasks.
arXiv Detail & Related papers (2023-07-27T08:58:39Z) - Evaluating Data Attribution for Text-to-Image Models [62.844382063780365]
We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style.
Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction.
By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
arXiv Detail & Related papers (2023-06-15T17:59:51Z) - Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves [18.5408134000081]
Formula-driven supervised learning has been shown to be an effective method for pre-training transformers.
VisualAtom-21k is used for pre-training ViT-Base, the top-1 accuracy reached 83.7% when fine-tuning on ImageNet-1k.
Unlike JFT-300M which is a static dataset, the quality of synthetic datasets will continue to improve.
arXiv Detail & Related papers (2023-03-02T09:47:28Z) - Corrupted Image Modeling for Self-Supervised Visual Pre-Training [103.99311611776697]
We introduce Corrupted Image Modeling (CIM) for self-supervised visual pre-training.
CIM uses an auxiliary generator with a small trainable BEiT to corrupt the input image instead of using artificial mask tokens.
After pre-training, the enhancer can be used as a high-capacity visual encoder for downstream tasks.
arXiv Detail & Related papers (2022-02-07T17:59:04Z) - Task2Sim : Towards Effective Pre-training and Transfer from Synthetic
Data [74.66568380558172]
We study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks.
We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters.
It learns this mapping by training to find the set of best parameters on a set of "seen" tasks.
Once trained, it can then be used to predict best simulation parameters for novel "unseen" tasks in one shot.
arXiv Detail & Related papers (2021-11-30T19:25:27Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.