PASS: An ImageNet replacement for self-supervised pretraining without
humans
- URL: http://arxiv.org/abs/2109.13228v1
- Date: Mon, 27 Sep 2021 17:59:39 GMT
- Title: PASS: An ImageNet replacement for self-supervised pretraining without
humans
- Authors: Yuki M. Asano, Christian Rupprecht, Andrew Zisserman, Andrea Vedaldi
- Abstract summary: We propose an unlabelled dataset PASS: Pictures without humAns for Self-Supervision.
PASS only contains images with CC-BY license and complete attribution metadata, addressing the copyright issue.
We show that PASS can be used for pretraining with methods such as MoCo-v2, SwAV and DINO.
PASS does not make existing datasets obsolete, as for instance it is insufficient for benchmarking. However, it shows that model pretraining is often possible while using safer data, and it also provides the basis for a more robust evaluation of pretraining methods.
- Score: 152.3252728876108
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computer vision has long relied on ImageNet and other large datasets of
images sampled from the Internet for pretraining models. However, these
datasets have ethical and technical shortcomings, such as containing personal
information taken without consent, unclear license usage, biases, and, in some
cases, even problematic image content. On the other hand, state-of-the-art
pretraining is nowadays obtained with unsupervised methods, meaning that
labelled datasets such as ImageNet may not be necessary, or perhaps not even
optimal, for model pretraining. We thus propose an unlabelled dataset PASS:
Pictures without humAns for Self-Supervision. PASS only contains images with
CC-BY license and complete attribution metadata, addressing the copyright
issue. Most importantly, it contains no images of people at all, and also
avoids other types of images that are problematic for data protection or
ethics. We show that PASS can be used for pretraining with methods such as
MoCo-v2, SwAV and DINO. In the transfer learning setting, it yields similar
downstream performances to ImageNet pretraining even on tasks that involve
humans, such as human pose estimation. PASS does not make existing datasets
obsolete, as for instance it is insufficient for benchmarking. However, it
shows that model pretraining is often possible while using safer data, and it
also provides the basis for a more robust evaluation of pretraining methods.
Related papers
- DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models [79.71665540122498]
We propose a method for detecting unauthorized data usage by planting the injected content into the protected dataset.
Specifically, we modify the protected images by adding unique contents on these images using stealthy image warping functions.
By analyzing whether the model has memorized the injected content, we can detect models that had illegally utilized the unauthorized data.
arXiv Detail & Related papers (2023-07-06T16:27:39Z) - DINOv2: Learning Robust Visual Features without Supervision [75.42921276202522]
This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources.
Most of the technical contributions aim at accelerating and stabilizing the training at scale.
In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature.
arXiv Detail & Related papers (2023-04-14T15:12:19Z) - ConfounderGAN: Protecting Image Data Privacy with Causal Confounder [85.6757153033139]
We propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners.
Experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets.
arXiv Detail & Related papers (2022-12-04T08:49:14Z) - Are Large-scale Datasets Necessary for Self-Supervised Pre-training? [29.49873710927313]
We consider a self-supervised pre-training scenario that only leverages the target task data.
Our study shows that denoising autoencoders, such as BEiT, are more robust to the type and size of the pre-training data.
On COCO, when pre-training solely using COCO images, the detection and instance segmentation performance surpasses the supervised ImageNet pre-training in a comparable setting.
arXiv Detail & Related papers (2021-12-20T18:41:32Z) - VTBR: Semantic-based Pretraining for Person Re-Identification [14.0819152482295]
We propose a pure semantic-based pretraining approach named VTBR.
We train convolutional networks from scratch on the captions of FineGPR-C dataset, and transfer them to downstream Re-ID tasks.
arXiv Detail & Related papers (2021-10-11T08:19:45Z) - Anti-Neuron Watermarking: Protecting Personal Data Against Unauthorized
Neural Model Training [50.308254937851814]
Personal data (e.g. images) could be exploited inappropriately to train deep neural network models without authorization.
By embedding a watermarking signature using specialized linear color transformation to user images, neural models will be imprinted with such a signature.
This is the first work to protect users' personal data from unauthorized usage in neural network training.
arXiv Detail & Related papers (2021-09-18T22:10:37Z) - ImageNet-21K Pretraining for the Masses [12.339884639594624]
ImageNet-1K serves as the primary dataset for pretraining deep learning models for computer vision tasks.
ImageNet-21K dataset contains more pictures and classes.
This paper aims to make high-quality efficient pretraining on ImageNet-21K available for everyone.
arXiv Detail & Related papers (2021-04-22T10:10:14Z) - Learning Transferable Visual Models From Natural Language Supervision [13.866297967166089]
Learning directly from raw text about images is a promising alternative.
We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn.
SOTA image representations are learned from scratch on a dataset of 400 million (image, text) pairs collected from the internet.
arXiv Detail & Related papers (2021-02-26T19:04:58Z) - VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning [128.6138588412508]
This paper presents VIsual VOcabulary pretraining (VIVO) that performs pre-training in the absence of caption annotations.
Our model can not only generate fluent image captions that describe novel objects, but also identify the locations of these objects.
arXiv Detail & Related papers (2020-09-28T23:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.