Bamboo: Building Mega-Scale Vision Dataset Continually with
Human-Machine Synergy
- URL: http://arxiv.org/abs/2203.07845v1
- Date: Tue, 15 Mar 2022 13:01:00 GMT
- Title: Bamboo: Building Mega-Scale Vision Dataset Continually with
Human-Machine Synergy
- Authors: Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He, Zhenfei Yin, Kun
Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu
- Abstract summary: Large-scale datasets play a vital role in computer vision.
Existing datasets are either collected according to label systems or blindly without differentiation to samples, making them inefficient and unscalable.
We advocate building a high-quality vision dataset actively annotated and continually on a comprehensive label system.
- Score: 69.07918114341298
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large-scale datasets play a vital role in computer vision. Existing datasets
are either collected according to heuristic label systems or annotated blindly
without differentiation to samples, making them inefficient and unscalable. How
to systematically collect, annotate and build a mega-scale dataset remains an
open question. In this work, we advocate building a high-quality vision dataset
actively and continually on a comprehensive label system. Specifically, we
contribute Bamboo Dataset, a mega-scale and information-dense dataset for both
classification and detection. Bamboo aims to populate the comprehensive
categories with 69M image classification annotations and 170,586 object
bounding box annotations. Compared to ImageNet22K and Objects365, models
pre-trained on Bamboo achieve superior performance among various downstream
tasks (6.2% gains on classification and 2.1% gains on detection). In addition,
we provide valuable observations regarding large-scale pre-training from over
1,000 experiments. Due to its scalable nature on both label system and
annotation pipeline, Bamboo will continue to grow and benefit from the
collective efforts of the community, which we hope would pave the way for more
general vision models.
Related papers
- OAM-TCD: A globally diverse dataset of high-resolution tree cover maps [8.336960607169175]
We present a novel open-access dataset for individual tree crown delineation (TCD) in high-resolution aerial imagery sourced from OpenMap (OAM)
Our dataset, OAM-TCD, comprises 5072 2048x2048px images at 10 cm/px resolution with associated human-labeled instance masks for over 280k individual and 56k groups of trees.
Using our dataset, we train reference instance and semantic segmentation models that compare favorably to existing state-of-the-art models.
arXiv Detail & Related papers (2024-07-16T14:11:29Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - Leveraging Human-Machine Interactions for Computer Vision Dataset
Quality Enhancement [0.0]
Large-scale datasets for single-label multi-class classification, such as emphImageNet-1k, have been instrumental in advancing deep learning and computer vision.
We introduce a lightweight, user-friendly, and scalable framework that synergizes human and machine intelligence for efficient dataset validation and quality enhancement.
By using Multilabelfy on the ImageNetV2 dataset, we found that approximately $47.88%$ of the images contained at least two labels.
arXiv Detail & Related papers (2024-01-31T10:57:07Z) - A Lightweight Clustering Framework for Unsupervised Semantic
Segmentation [28.907274978550493]
Unsupervised semantic segmentation aims to categorize each pixel in an image into a corresponding class without the use of annotated data.
We propose a lightweight clustering framework for unsupervised semantic segmentation.
Our framework achieves state-of-the-art results on PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2023-11-30T15:33:42Z) - A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect
Dataset [18.211840156134784]
This paper presents a curated million-image dataset, primarily to train computer-vision models capable of providing image-based taxonomic assessment.
The dataset also presents compelling characteristics, the study of which would be of interest to the broader machine learning community.
arXiv Detail & Related papers (2023-07-19T20:54:08Z) - Large Scale Real-World Multi-Person Tracking [68.27438015329807]
This paper presents a new large scale multi-person tracking dataset -- textttPersonPath22.
It is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20.
arXiv Detail & Related papers (2022-11-03T23:03:13Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.