Active Data Discovery: Mining Unknown Data using Submodular Information
Measures
- URL: http://arxiv.org/abs/2206.08566v1
- Date: Fri, 17 Jun 2022 05:52:18 GMT
- Title: Active Data Discovery: Mining Unknown Data using Submodular Information
Measures
- Authors: Suraj Kothawade, Shivang Chopra, Saikat Ghosh, Rishabh Iyer
- Abstract summary: We provide an active data discovery framework which can mine unknown data slices and classes efficiently.
We show significant accuracy and labeling efficiency gains with our approach compared to existing state-of-the-art active learning approaches.
- Score: 1.7491858164568674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Active Learning is a very common yet powerful framework for iteratively and
adaptively sampling subsets of the unlabeled sets with a human in the loop with
the goal of achieving labeling efficiency. Most real world datasets have
imbalance either in classes and slices, and correspondingly, parts of the
dataset are rare. As a result, there has been a lot of work in designing active
learning approaches for mining these rare data instances. Most approaches
assume access to a seed set of instances which contain these rare data
instances. However, in the event of more extreme rareness, it is reasonable to
assume that these rare data instances (either classes or slices) may not even
be present in the seed labeled set, and a critical need for the active learning
paradigm is to efficiently discover these rare data instances. In this work, we
provide an active data discovery framework which can mine unknown data slices
and classes efficiently using the submodular conditional gain and submodular
conditional mutual information functions. We provide a general algorithmic
framework which works in a number of scenarios including image classification
and object detection and works with both rare classes and rare slices present
in the unlabeled set. We show significant accuracy and labeling efficiency
gains with our approach compared to existing state-of-the-art active learning
approaches for actively discovering these rare classes and slices.
Related papers
- Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Inconsistency-Based Data-Centric Active Open-Set Annotation [6.652785290214744]
NEAT is a data-centric active learning method that actively annotates open-set data.
NEAT achieves significantly better performance than state-of-the-art active learning methods for active open-set annotation.
arXiv Detail & Related papers (2024-01-10T04:18:02Z) - Unsupervised Estimation of Ensemble Accuracy [0.0]
We present a method for estimating the joint power of several classifiers.
It differs from existing approaches which focus on "diversity" measures by not relying on labels.
We demonstrate the method on popular large-scale face recognition datasets.
arXiv Detail & Related papers (2023-11-18T02:31:36Z) - Deep Active Learning with Contrastive Learning Under Realistic Data Pool
Assumptions [2.578242050187029]
Active learning aims to identify the most informative data from an unlabeled data pool that enables a model to reach the desired accuracy rapidly.
Most existing active learning methods have been evaluated in an ideal setting where only samples relevant to the target task exist in an unlabeled data pool.
We introduce new active learning benchmarks that include ambiguous, task-irrelevant out-of-distribution as well as in-distribution samples.
arXiv Detail & Related papers (2023-03-25T10:46:10Z) - ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection.
Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z) - Can I see an Example? Active Learning the Long Tail of Attributes and
Relations [64.50739983632006]
We introduce a novel incremental active learning framework that asks for attributes and relations in visual scenes.
While conventional active learning methods ask for labels of specific examples, we flip this framing to allow agents to ask for examples from specific categories.
Using this framing, we introduce an active sampling method that asks for examples from the tail of the data distribution and show that it outperforms classical active learning methods on Visual Genome.
arXiv Detail & Related papers (2022-03-11T19:28:19Z) - TALISMAN: Targeted Active Learning for Object Detection with Rare
Classes and Slices using Submodular Mutual Information [16.34454526943999]
We propose a novel framework for Targeted Active Learning or object detectIon with rare slices.
Our method uses the submodular mutual information functions instantiated using features of the region of interest.
We evaluate our framework on the standard PASCAL VOC07+12 and BDD100K, a real-world self-driving dataset.
arXiv Detail & Related papers (2021-11-30T23:17:53Z) - Understanding the World Through Action [91.3755431537592]
I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning.
I will discuss how such a procedure is more closely aligned with potential downstream tasks.
arXiv Detail & Related papers (2021-10-24T22:33:52Z) - Class-Balanced Active Learning for Image Classification [29.5211685759702]
We propose a general optimization framework that explicitly takes class-balancing into account.
Results on three datasets showed that the method is general (it can be combined with most existing active learning algorithms) and can be effectively applied to boost the performance of both informative and representative-based active learning methods.
arXiv Detail & Related papers (2021-10-09T11:30:26Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.