Large image datasets: A pyrrhic win for computer vision?
- URL: http://arxiv.org/abs/2006.16923v2
- Date: Fri, 24 Jul 2020 02:55:13 GMT
- Title: Large image datasets: A pyrrhic win for computer vision?
- Authors: Vinay Uday Prabhu, Abeba Birhane
- Abstract summary: We investigate problematic practices and consequences of large scale vision datasets.
We examine broad issues such as the question of consent and justice as well as specific concerns such as the inclusion of verifiably pornographic images in datasets.
- Score: 2.627046865670577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we investigate problematic practices and consequences of large
scale vision datasets. We examine broad issues such as the question of consent
and justice as well as specific concerns such as the inclusion of verifiably
pornographic images in datasets. Taking the ImageNet-ILSVRC-2012 dataset as an
example, we perform a cross-sectional model-based quantitative census covering
factors such as age, gender, NSFW content scoring, class-wise accuracy,
human-cardinality-analysis, and the semanticity of the image class information
in order to statistically investigate the extent and subtleties of ethical
transgressions. We then use the census to help hand-curate a look-up-table of
images in the ImageNet-ILSVRC-2012 dataset that fall into the categories of
verifiably pornographic: shot in a non-consensual setting (up-skirt), beach
voyeuristic, and exposed private parts. We survey the landscape of harm and
threats both society broadly and individuals face due to uncritical and
ill-considered dataset curation practices. We then propose possible courses of
correction and critique the pros and cons of these. We have duly open-sourced
all of the code and the census meta-datasets generated in this endeavor for the
computer vision community to build on. By unveiling the severity of the
threats, our hope is to motivate the constitution of mandatory Institutional
Review Boards (IRB) for large scale dataset curation processes.
Related papers
- Vision-Language Models under Cultural and Inclusive Considerations [53.614528867159706]
Large vision-language models (VLMs) can assist visually impaired people by describing images from their daily lives.
Current evaluation datasets may not reflect diverse cultural user backgrounds or the situational context of this use case.
We create a survey to determine caption preferences and propose a culture-centric evaluation benchmark by filtering VizWiz, an existing dataset with images taken by people who are blind.
We then evaluate several VLMs, investigating their reliability as visual assistants in a culturally diverse setting.
arXiv Detail & Related papers (2024-07-08T17:50:00Z) - Visual Context-Aware Person Fall Detection [52.49277799455569]
We present a segmentation pipeline to semi-automatically separate individuals and objects in images.
Background objects such as beds, chairs, or wheelchairs can challenge fall detection systems, leading to false positive alarms.
We demonstrate that object-specific contextual transformations during training effectively mitigate this challenge.
arXiv Detail & Related papers (2024-04-11T19:06:36Z) - Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B.
We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively.
We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z) - Uncurated Image-Text Datasets: Shedding Light on Demographic Bias [21.421722941901123]
Even small but manually annotated datasets, such as MSCOCO, are affected by societal bias.
Our first contribution is to annotate part of the Google Conceptual Captions dataset, widely used for training vision-and-language models.
Second contribution is to conduct a comprehensive analysis of the annotations, focusing on how different demographic groups are represented.
Third contribution is to evaluate three prevailing vision-and-language tasks, showing that societal bias is a persistent problem in all of them.
arXiv Detail & Related papers (2023-04-06T02:33:51Z) - ConfounderGAN: Protecting Image Data Privacy with Causal Confounder [85.6757153033139]
We propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners.
Experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets.
arXiv Detail & Related papers (2022-12-04T08:49:14Z) - Inferring Offensiveness In Images From Natural Language Supervision [20.294073012815854]
Large image datasets automatically scraped from the web may contain derogatory terms as categories and offensive images.
We show that pre-trained transformers themselves provide a methodology for the automated curation of large-scale vision datasets.
arXiv Detail & Related papers (2021-10-08T16:19:21Z) - Improving Fractal Pre-training [0.76146285961466]
We propose an improved pre-training dataset based on dynamically-generated fractal images.
Our experiments demonstrate that fine-tuning a network pre-trained using fractals attains 92.7-98.1% of the accuracy of an ImageNet pre-trained network.
arXiv Detail & Related papers (2021-10-06T22:39:51Z) - Multimodal datasets: misogyny, pornography, and malignant stereotypes [2.8682942808330703]
We examine the recently released LAION-400M dataset, which is a CLIP-filtered dataset of Image-Alt-text pairs parsed from the Common-Crawl dataset.
We found that the dataset contains, troublesome and explicit images and text pairs of rape, pornography, malign stereotypes, racist and ethnic slurs, and other extremely problematic content.
arXiv Detail & Related papers (2021-10-05T11:47:27Z) - Contemplating real-world object classification [53.10151901863263]
We reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations.
We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement.
arXiv Detail & Related papers (2021-03-08T23:29:59Z) - From ImageNet to Image Classification: Contextualizing Progress on
Benchmarks [99.19183528305598]
We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset.
Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for.
arXiv Detail & Related papers (2020-05-22T17:39:16Z) - Privacy-Preserving Image Classification in the Local Setting [17.375582978294105]
Local Differential Privacy (LDP) brings us a promising solution, which allows the data owners to randomly perturb their input to provide the plausible deniability of the data before releasing.
In this paper, we consider a two-party image classification problem, in which data owners hold the image and the untrustworthy data user would like to fit a machine learning model with these images as input.
We propose a supervised image feature extractor, DCAConv, which produces an image representation with scalable domain size.
arXiv Detail & Related papers (2020-02-09T01:25:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.