Zero-Round Active Learning
- URL: http://arxiv.org/abs/2107.06703v2
- Date: Thu, 15 Jul 2021 00:49:48 GMT
- Title: Zero-Round Active Learning
- Authors: Si Chen, Tianhao Wang, Ruoxi Jia
- Abstract summary: Active learning (AL) aims at reducing labeling effort by identifying the most valuable unlabeled data points from a large pool.
Traditional AL frameworks have two limitations: First, they perform data selection in a multi-round manner, which is time-consuming and impractical.
Recent work proposes a solution for one-round active learning based on data utility learning and optimization.
- Score: 13.25385227263705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Active learning (AL) aims at reducing labeling effort by identifying the most
valuable unlabeled data points from a large pool. Traditional AL frameworks
have two limitations: First, they perform data selection in a multi-round
manner, which is time-consuming and impractical. Second, they usually assume
that there are a small amount of labeled data points available in the same
domain as the data in the unlabeled pool. Recent work proposes a solution for
one-round active learning based on data utility learning and optimization,
which fixes the first issue but still requires the initially labeled data
points in the same domain. In this paper, we propose $\mathrm{D^2ULO}$ as a
solution that solves both issues. Specifically, $\mathrm{D^2ULO}$ leverages the
idea of domain adaptation (DA) to train a data utility model which can
effectively predict the utility for any given unlabeled data in the target
domain once labeled. The trained data utility model can then be used to select
high-utility data and at the same time, provide an estimate for the utility of
the selected data. Our algorithm does not rely on any feedback from annotators
in the target domain and hence, can be used to perform zero-round active
learning or warm-start existing multi-round active learning strategies. Our
experiments show that $\mathrm{D^2ULO}$ outperforms the existing
state-of-the-art AL strategies equipped with domain adaptation over various
domain shift settings (e.g., real-to-real data and synthetic-to-real data).
Particularly, $\mathrm{D^2ULO}$ is applicable to the scenario where source and
target labels have mismatches, which is not supported by the existing works.
Related papers
- Robust Target Training for Multi-Source Domain Adaptation [110.77704026569499]
We propose a novel Bi-level Optimization based Robust Target Training (BORT$2$) method for MSDA.
Our proposed method achieves the state of the art performance on three MSDA benchmarks, including the large-scale DomainNet dataset.
arXiv Detail & Related papers (2022-10-04T15:20:01Z) - Domain Adaptive Semantic Segmentation without Source Data [50.18389578589789]
We investigate domain adaptive semantic segmentation without source data, which assumes that the model is pre-trained on the source domain.
We propose an effective framework for this challenging problem with two components: positive learning and negative learning.
Our framework can be easily implemented and incorporated with other methods to further enhance the performance.
arXiv Detail & Related papers (2021-10-13T04:12:27Z) - S$^3$VAADA: Submodular Subset Selection for Virtual Adversarial Active
Domain Adaptation [49.01925978752677]
In the real-world scenario's it might be feasible to get labels for a small proportion of target data.
We propose S$3$VAADA which i) introduces a novel submodular criterion to select a maximally informative subset to label and ii) enhances a cluster-based DA procedure.
Our approach consistently outperforms the competing state-of-the-art approaches on datasets with varying degrees of domain shifts.
arXiv Detail & Related papers (2021-09-18T10:53:57Z) - Active Covering [37.525977525895605]
We analyze the problem of active covering, where the learner is given an unlabeled dataset and can sequentially label query examples.
The objective is to label query all of the positive examples in the fewest number of total label queries.
arXiv Detail & Related papers (2021-06-04T15:32:39Z) - One-Round Active Learning [13.25385227263705]
One-round active learning aims to select a subset of unlabeled data points that achieve the highest utility after being labeled.
We propose DULO, a general framework for one-round active learning based on the notion of data utility functions.
Our results demonstrate that while existing active learning approaches could succeed with multiple rounds, DULO consistently performs better in the one-round setting.
arXiv Detail & Related papers (2021-04-23T23:59:50Z) - OVANet: One-vs-All Network for Universal Domain Adaptation [78.86047802107025]
Existing methods manually set a threshold to reject unknown samples based on validation or a pre-defined ratio of unknown samples.
We propose a method to learn the threshold using source samples and to adapt it to the target domain.
Our idea is that a minimum inter-class distance in the source domain should be a good threshold to decide between known or unknown in the target.
arXiv Detail & Related papers (2021-04-07T18:36:31Z) - Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised
Pre-Training [67.71228426496013]
We show that using target domain data during pre-training leads to large performance improvements across a variety of setups.
We find that pre-training on multiple domains improves performance generalization on domains not seen during training.
arXiv Detail & Related papers (2021-04-02T12:53:15Z) - How to distribute data across tasks for meta-learning? [59.608652082495624]
We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets.
Our results suggest a simple and efficient procedure for data collection.
arXiv Detail & Related papers (2021-03-15T15:38:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.