In Defense of Core-set: A Density-aware Core-set Selection for Active
Learning
- URL: http://arxiv.org/abs/2206.04838v2
- Date: Mon, 13 Jun 2022 01:09:19 GMT
- Title: In Defense of Core-set: A Density-aware Core-set Selection for Active
Learning
- Authors: Yeachan Kim, Bonggun Shin
- Abstract summary: In a real-world active learning scenario, considering the diversity of the selected samples is crucial.
In this work, we analyze the feature space through the lens of the density and propose a density-aware core-set (DACS)
The strategy is to estimate the density of the unlabeled samples and select diverse samples mainly from sparse regions.
- Score: 3.6753274024067593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Active learning enables the efficient construction of a labeled dataset by
labeling informative samples from an unlabeled dataset. In a real-world active
learning scenario, considering the diversity of the selected samples is crucial
because many redundant or highly similar samples exist. Core-set approach is
the promising diversity-based method selecting diverse samples based on the
distance between samples. However, the approach poorly performs compared to the
uncertainty-based approaches that select the most difficult samples where
neural models reveal low confidence. In this work, we analyze the feature space
through the lens of the density and, interestingly, observe that locally sparse
regions tend to have more informative samples than dense regions. Motivated by
our analysis, we empower the core-set approach with the density-awareness and
propose a density-aware core-set (DACS). The strategy is to estimate the
density of the unlabeled samples and select diverse samples mainly from sparse
regions. To reduce the computational bottlenecks in estimating the density, we
also introduce a new density approximation based on locality-sensitive hashing.
Experimental results clearly demonstrate the efficacy of DACS in both
classification and regression tasks and specifically show that DACS can produce
state-of-the-art performance in a practical scenario. Since DACS is weakly
dependent on neural architectures, we present a simple yet effective
combination method to show that the existing methods can be beneficially
combined with DACS.
Related papers
- Learning from Different Samples: A Source-free Framework for Semi-supervised Domain Adaptation [20.172605920901777]
This paper focuses on designing a framework to use different strategies for comprehensively mining different target samples.
We propose a novel source-free framework (SOUF) to achieve semi-supervised fine-tuning of the source pre-trained model on the target domain.
arXiv Detail & Related papers (2024-11-11T02:09:32Z) - Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective [48.99488315273868]
We present a contrastive knowledge distillation approach, which can be formulated as a sample-wise alignment problem with intra- and inter-sample constraints.
Our method minimizes logit differences within the same sample by considering their numerical values.
We conduct comprehensive experiments on three datasets including CIFAR-100, ImageNet-1K, and MS COCO.
arXiv Detail & Related papers (2024-04-22T11:52:40Z) - Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation [51.66997548477913]
We propose a novel feature-level consistency learning framework named Density-Descending Feature Perturbation (DDFP)
Inspired by the low-density separation assumption in semi-supervised learning, our key insight is that feature density can shed a light on the most promising direction for the segmentation classifier to explore.
The proposed DDFP outperforms other designs on feature-level perturbations and shows state of the art performances on both Pascal VOC and Cityscapes dataset.
arXiv Detail & Related papers (2024-03-11T06:59:05Z) - Density Matters: Improved Core-set for Active Domain Adaptive
Segmentation [35.58476357071544]
Active domain adaptation has emerged as a solution to balance the expensive annotation cost and the performance of trained models in semantic segmentation.
In this work, we revisit the theoretical bound of the classical Core-set method and identify that the performance is closely related to the local sample distribution around selected samples.
We introduce a local proxy estimator with Dynamic Masked Convolution and develop a Density-aware Greedy algorithm to optimize the bound.
arXiv Detail & Related papers (2023-12-15T08:22:36Z) - Optimal Sample Selection Through Uncertainty Estimation and Its
Application in Deep Learning [22.410220040736235]
We present a theoretically optimal solution for addressing both coreset selection and active learning.
Our proposed method, COPS, is designed to minimize the expected loss of a model trained on subsampled data.
arXiv Detail & Related papers (2023-09-05T14:06:33Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - MuRAL: Multi-Scale Region-based Active Learning for Object Detection [20.478741635006116]
We propose a novel approach called Multi-scale Region-based Active Learning (MuRAL) for object detection.
MuRAL identifies informative regions of various scales to reduce annotation costs for well-learned objects.
Our proposed method surpasses all existing coarse-grained and fine-grained baselines on Cityscapes and MS COCO datasets.
arXiv Detail & Related papers (2023-03-29T12:52:27Z) - ScatterSample: Diversified Label Sampling for Data Efficient Graph
Neural Network Learning [22.278779277115234]
In some applications where graph neural network (GNN) training is expensive, labeling new instances is expensive.
We develop a data-efficient active sampling framework, ScatterSample, to train GNNs under an active learning setting.
Our experiments on five datasets show that ScatterSample significantly outperforms the other GNN active learning baselines.
arXiv Detail & Related papers (2022-06-09T04:05:02Z) - Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms.
We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.