Diverse Complexity Measures for Dataset Curation in Self-driving
- URL: http://arxiv.org/abs/2101.06554v1
- Date: Sat, 16 Jan 2021 23:45:02 GMT
- Title: Diverse Complexity Measures for Dataset Curation in Self-driving
- Authors: Abbas Sadat, Sean Segal, Sergio Casas, James Tu, Bin Yang, Raquel
Urtasun, Ersin Yumer
- Abstract summary: We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
- Score: 80.55417232642124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern self-driving autonomy systems heavily rely on deep learning. As a
consequence, their performance is influenced significantly by the quality and
richness of the training data. Data collecting platforms can generate many
hours of raw data in a daily basis, however, it is not feasible to label
everything. It is thus of key importance to have a mechanism to identify "what
to label". Active learning approaches identify examples to label, but their
interestingness is tied to a fixed model performing a particular task. These
assumptions are not valid in self-driving, where we have to solve a diverse set
of tasks (i.e., perception, and motion forecasting) and our models evolve over
time frequently. In this paper we introduce a novel approach and propose a new
data selection method that exploits a diverse set of criteria that quantize
interestingness of traffic scenes. Our experiments on a wide range of tasks and
models show that the proposed curation pipeline is able to select datasets that
lead to better generalization and higher performance.
Related papers
- Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues.
We propose a novel approach to address this issue at test time without requiring retraining.
MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - Exploring intra-task relations to improve meta-learning algorithms [1.223779595809275]
We aim to exploit external knowledge of task relations to improve training stability via effective mini-batching of tasks.
We hypothesize that selecting a diverse set of tasks in a mini-batch will lead to a better estimate of the full gradient and hence will lead to a reduction of noise in training.
arXiv Detail & Related papers (2023-12-27T15:33:52Z) - Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and
Reasoning [19.43430577960824]
This paper introduces a novel dataset, Rank2Tell, a multi-modal ego-centric dataset for Ranking the importance level and Telling the reason for the importance.
Using various close and open-ended visual question answering, the dataset provides dense annotations of various semantic, spatial, temporal, and relational attributes of various important objects in complex traffic scenarios.
arXiv Detail & Related papers (2023-09-12T20:51:07Z) - A Benchmark Generative Probabilistic Model for Weak Supervised Learning [2.0257616108612373]
Weak Supervised Learning approaches have been developed to alleviate the annotation burden.
We show that latent variable models (PLVMs) achieve state-of-the-art performance across four datasets.
arXiv Detail & Related papers (2023-03-31T07:06:24Z) - Frugal Reinforcement-based Active Learning [12.18340575383456]
We propose a novel active learning approach for label-efficient training.
The proposed method is iterative and aims at minimizing a constrained objective function that mixes diversity, representativity and uncertainty criteria.
We also introduce a novel weighting mechanism based on reinforcement learning, which adaptively balances these criteria at each training iteration.
arXiv Detail & Related papers (2022-12-09T14:17:45Z) - Time-Varying Propensity Score to Bridge the Gap between the Past and Present [104.46387765330142]
We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data.
We demonstrate different ways of implementing it and evaluate it on a variety of problems.
arXiv Detail & Related papers (2022-10-04T07:21:49Z) - Online Coreset Selection for Rehearsal-based Continual Learning [65.85595842458882]
In continual learning, we store a subset of training examples (coreset) to be replayed later to alleviate catastrophic forgetting.
We propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration.
Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting.
arXiv Detail & Related papers (2021-06-02T11:39:25Z) - Just Label What You Need: Fine-Grained Active Selection for Perception
and Prediction through Partially Labeled Scenes [78.23907801786827]
We introduce generalizations that ensure that our approach is both cost-aware and allows for fine-grained selection of examples through partially labeled scenes.
Our experiments on a real-world, large-scale self-driving dataset suggest that fine-grained selection can improve the performance across perception, prediction, and downstream planning tasks.
arXiv Detail & Related papers (2021-04-08T17:57:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.