Fantastic Data and How to Query Them
- URL: http://arxiv.org/abs/2201.05026v1
- Date: Thu, 13 Jan 2022 15:24:46 GMT
- Title: Fantastic Data and How to Query Them
- Authors: Trung-Kien Tran, Anh Le-Tuan, Manh Nguyen-Duc, Jicheng Yuan, Danh
Le-Phuoc
- Abstract summary: We present our vision about a unified framework for different datasets so that they can be integrated and easily queried.
We demonstrate this in our ongoing work to create a framework for datasets in Computer Vision and show its advantages in different scenarios.
- Score: 3.464871689508835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is commonly acknowledged that the availability of the huge amount of
(training) data is one of the most important factors for many recent advances
in Artificial Intelligence (AI). However, datasets are often designed for
specific tasks in narrow AI sub areas and there is no unified way to manage and
access them. This not only creates unnecessary overheads when training or
deploying Machine Learning models but also limits the understanding of the
data, which is very important for data-centric AI. In this paper, we present
our vision about a unified framework for different datasets so that they can be
integrated and queried easily, e.g., using standard query languages. We
demonstrate this in our ongoing work to create a framework for datasets in
Computer Vision and show its advantages in different scenarios. Our
demonstration is available at https://vision.semkg.org.
Related papers
- On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - An Unbiased Look at Datasets for Visuo-Motor Pre-Training [20.094244564603184]
We show that dataset choice is just as important to this paradigm's success.
We observe that traditional vision datasets are surprisingly competitive options for visuo-motor representation learning.
We show that common simulation benchmarks are not a reliable proxy for real world performance.
arXiv Detail & Related papers (2023-10-13T17:59:02Z) - VisionKG: Unleashing the Power of Visual Datasets via Knowledge Graph [2.3143591448419074]
Vision Knowledge Graph (VisionKG) is a novel resource that interlinks, organizes and manages visual datasets via knowledge graphs and Semantic Web technologies.
VisionKG currently contains 519 million RDF triples that describe approximately 40 million entities.
arXiv Detail & Related papers (2023-09-24T11:19:13Z) - Curriculum-Based Imitation of Versatile Skills [15.97723808124603]
Learning skills by imitation is a promising concept for the intuitive teaching of robots.
A common way to learn such skills is to learn a parametric model by maximizing the likelihood given the demonstrations.
Yet, human demonstrations are often multi-modal, i.e., the same task is solved in multiple ways.
arXiv Detail & Related papers (2023-04-11T12:10:41Z) - Data-centric Artificial Intelligence: A Survey [47.24049907785989]
Recently, the role of data in AI has been significantly magnified, giving rise to the emerging concept of data-centric AI.
In this survey, we discuss the necessity of data-centric AI, followed by a holistic view of three general data-centric goals.
We believe this is the first comprehensive survey that provides a global view of a spectrum of tasks across various stages of the data lifecycle.
arXiv Detail & Related papers (2023-03-17T17:44:56Z) - Data-centric AI: Perspectives and Challenges [51.70828802140165]
Data-centric AI (DCAI) advocates a fundamental shift from model advancements to ensuring data quality and reliability.
We bring together three general missions: training data development, inference data development, and data maintenance.
arXiv Detail & Related papers (2023-01-12T05:28:59Z) - Dataset Structural Index: Understanding a machine's perspective towards
visual data [0.0]
I show two meta values with which we can get more information over a visual dataset and use it to optimize data, create better architectures, and have an ability to guess which model would work best.
In the paper, I show many applications of DSI, one of which is how the same level of accuracy can be achieved with the same model architectures trained over less amount of data.
arXiv Detail & Related papers (2021-10-05T06:40:16Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.