Measures of Complexity for Large Scale Image Datasets
- URL: http://arxiv.org/abs/2008.04431v1
- Date: Mon, 10 Aug 2020 21:54:23 GMT
- Title: Measures of Complexity for Large Scale Image Datasets
- Authors: Ameet Annasaheb Rahane and Anbumani Subramanian
- Abstract summary: In this work, we build a series of relatively simple methods to measure the complexity of a dataset.
We present our analysis using four datasets from the autonomous driving research community - Cityscapes, IDD, BDD and Vistas.
Using entropy based metrics, we present a rank-order complexity of these datasets, which we compare with an established rank-order with respect to deep learning.
- Score: 0.3655021726150368
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Large scale image datasets are a growing trend in the field of machine
learning. However, it is hard to quantitatively understand or specify how
various datasets compare to each other - i.e., if one dataset is more complex
or harder to ``learn'' with respect to a deep-learning based network. In this
work, we build a series of relatively computationally simple methods to measure
the complexity of a dataset. Furthermore, we present an approach to demonstrate
visualizations of high dimensional data, in order to assist with visual
comparison of datasets. We present our analysis using four datasets from the
autonomous driving research community - Cityscapes, IDD, BDD and Vistas. Using
entropy based metrics, we present a rank-order complexity of these datasets,
which we compare with an established rank-order with respect to deep learning.
Related papers
- Scaling Laws for the Value of Individual Data Points in Machine Learning [55.596413470429475]
We introduce a new perspective by investigating scaling behavior for the value of individual data points.
We provide learning theory to support our scaling law, and we observe empirically that it holds across diverse model classes.
Our work represents a first step towards understanding and utilizing scaling properties for the value of individual data points.
arXiv Detail & Related papers (2024-05-30T20:10:24Z) - MASSTAR: A Multi-Modal and Large-Scale Scene Dataset with a Versatile Toolchain for Surface Prediction and Completion [25.44529512862336]
MASSTAR is a multi-modal lArge-scale scene dataset with a verSatile Toolchain for surfAce pRediction and completion.
We develop a versatile and efficient toolchain for processing the raw 3D data from the environments.
We generate an example dataset composed of over a thousand scene-level models with partial real-world data.
arXiv Detail & Related papers (2024-03-18T11:35:18Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z) - Joint Geometric and Topological Analysis of Hierarchical Datasets [7.098759778181621]
In this paper, we focus on high-dimensional data that are organized into several hierarchical datasets.
The main novelty in this work lies in the combination of two powerful data-analytic approaches: topological data analysis and geometric manifold learning.
We show that our new method gives rise to superior classification results compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-04-03T13:02:00Z) - Automatic Curation of Large-Scale Datasets for Audio-Visual
Representation Learning [62.47593143542552]
We describe a subset optimization approach for automatic dataset curation.
We demonstrate that our approach finds videos with high audio-visual correspondence and show that self-supervised models trained on our data, despite being automatically constructed, achieve similar downstream performances to existing video datasets with similar scales.
arXiv Detail & Related papers (2021-01-26T14:27:47Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Dataset Condensation with Gradient Matching [36.14340188365505]
We propose a training set synthesis technique for data-efficient learning, called dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep neural networks from scratch.
We rigorously evaluate its performance in several computer vision benchmarks and demonstrate that it significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-06-10T16:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.