Oracle-MNIST: a Realistic Image Dataset for Benchmarking Machine
Learning Algorithms
- URL: http://arxiv.org/abs/2205.09442v1
- Date: Thu, 19 May 2022 09:57:45 GMT
- Title: Oracle-MNIST: a Realistic Image Dataset for Benchmarking Machine
Learning Algorithms
- Authors: Mei Wang, Weihong Deng
- Abstract summary: We introduce the Oracle-MNIST dataset, comprising of 28$times $28 grayscale images of 30,222 ancient characters.
The training set totally consists of 27,222 images, and the test set contains 300 images per class.
- Score: 57.29464116557734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce the Oracle-MNIST dataset, comprising of 28$\times $28 grayscale
images of 30,222 ancient characters from 10 categories, for benchmarking
pattern classification, with particular challenges on image noise and
distortion. The training set totally consists of 27,222 images, and the test
set contains 300 images per class. Oracle-MNIST shares the same data format
with the original MNIST dataset, allowing for direct compatibility with all
existing classifiers and systems, but it constitutes a more challenging
classification task than MNIST. The images of ancient characters suffer from 1)
extremely serious and unique noises caused by three-thousand years of burial
and aging and 2) dramatically variant writing styles by ancient Chinese, which
all make them realistic for machine learning research. The dataset is freely
available at https://github.com/wm-bupt/oracle-mnist.
Related papers
- Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation [58.09421301921607]
We construct the first large-scale dataset for subject-driven image editing and generation.
Our dataset is 5 times the size of previous largest dataset, yet our cost is tens of thousands of GPU hours lower.
arXiv Detail & Related papers (2024-06-13T16:40:39Z) - IIITD-20K: Dense captioning for Text-Image ReID [5.858839403963778]
IIITD-20K comprises of 20,000 unique identities captured in the wild.
With a minimum of 26 words for a description, each image is densely captioned.
We perform elaborate experiments using state-of-art text-to-image ReID models and vision-language pre-trained models.
arXiv Detail & Related papers (2023-05-08T06:46:56Z) - Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases [8.455991178281469]
We present benchmark-O2O, M2M-Easy, Medium, Hard, an image classification benchmark suite containing spurious correlations between classes and backgrounds.
The resulting dataset is of high quality and contains approximately 152k images.
arXiv Detail & Related papers (2023-03-09T18:22:12Z) - Bugs in the Data: How ImageNet Misrepresents Biodiversity [98.98950914663813]
We analyze the 13,450 images from 269 classes that represent wild animals in the ImageNet-1k validation set.
We find that many of the classes are ill-defined or overlapping, and that 12% of the images are incorrectly labeled.
We also find that both the wildlife-related labels and images included in ImageNet-1k present significant geographical and cultural biases.
arXiv Detail & Related papers (2022-08-24T17:55:48Z) - PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model
Pretraining [68.84339672878066]
We introduce PyramidCLIP, which constructs an input pyramid with different semantic levels, and aligns visual elements and linguistic elements in the form of hierarchy.
Experiments on three downstream tasks, including zero-shot image classification, zero-shot image-text retrieval and image object detection, verify the effectiveness of the proposed PyramidCLIP.
arXiv Detail & Related papers (2022-04-29T13:38:42Z) - Data Efficient Language-supervised Zero-shot Recognition with Optimal
Transport Distillation [43.03533959429743]
We propose OTTER, which uses online optimal transport to find a soft image-text match as labels for contrastive learning.
Based on pretrained image and text encoders, models trained with OTTER achieve strong performance with only 3M image text pairs.
arXiv Detail & Related papers (2021-12-17T11:27:26Z) - MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D
Biomedical Image Classification [59.10015984688104]
MedMNIST v2 is a large-scale MNIST-like dataset collection of standardized biomedical images.
The resulting dataset consists of 708,069 2D images and 10,214 3D images in total.
arXiv Detail & Related papers (2021-10-27T22:02:04Z) - Learning to See by Looking at Noise [87.12788334473295]
We investigate a suite of image generation models that produce images from simple random processes.
These are then used as training data for a visual representation learner with a contrastive loss.
Our findings show that it is important for the noise to capture certain structural properties of real data but that good performance can be achieved even with processes that are far from realistic.
arXiv Detail & Related papers (2021-06-10T17:56:46Z) - The Semi-Supervised iNaturalist-Aves Challenge at FGVC7 Workshop [42.02670649470055]
This document describes the details and the motivation behind a new dataset we collected for the semi-supervised recognition challengecitesemi-aves at the FGVC7 workshop at CVPR 2020.
The dataset contains 1000 species of birds sampled from the iNat-2018 dataset for a total of nearly 150k images.
arXiv Detail & Related papers (2021-03-11T20:21:16Z) - Google Landmarks Dataset v2 -- A Large-Scale Benchmark for
Instance-Level Recognition and Retrieval [9.922132565411664]
We introduce the Google Landmarks dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval.
GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels.
The dataset is sourced from Wikimedia Commons, the world's largest crowdsourced collection of landmark photos.
arXiv Detail & Related papers (2020-04-03T22:52:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.