Related papers: In Defense of Core-set: A Density-aware Core-set Selection for Active Learning

In Defense of Core-set: A Density-aware Core-set Selection for Active Learning

URL: http://arxiv.org/abs/2206.04838v2
Date: Mon, 13 Jun 2022 01:09:19 GMT
Title: In Defense of Core-set: A Density-aware Core-set Selection for Active Learning
Authors: Yeachan Kim, Bonggun Shin
Abstract summary: In a real-world active learning scenario, considering the diversity of the selected samples is crucial. In this work, we analyze the feature space through the lens of the density and propose a density-aware core-set (DACS) The strategy is to estimate the density of the unlabeled samples and select diverse samples mainly from sparse regions.
Score: 3.6753274024067593
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Active learning enables the efficient construction of a labeled dataset by labeling informative samples from an unlabeled dataset. In a real-world active learning scenario, considering the diversity of the selected samples is crucial because many redundant or highly similar samples exist. Core-set approach is the promising diversity-based method selecting diverse samples based on the distance between samples. However, the approach poorly performs compared to the uncertainty-based approaches that select the most difficult samples where neural models reveal low confidence. In this work, we analyze the feature space through the lens of the density and, interestingly, observe that locally sparse regions tend to have more informative samples than dense regions. Motivated by our analysis, we empower the core-set approach with the density-awareness and propose a density-aware core-set (DACS). The strategy is to estimate the density of the unlabeled samples and select diverse samples mainly from sparse regions. To reduce the computational bottlenecks in estimating the density, we also introduce a new density approximation based on locality-sensitive hashing. Experimental results clearly demonstrate the efficacy of DACS in both classification and regression tasks and specifically show that DACS can produce state-of-the-art performance in a practical scenario. Since DACS is weakly dependent on neural architectures, we present a simple yet effective combination method to show that the existing methods can be beneficially combined with DACS.

Related papers

Neighbor Does Matter: Density-Aware Contrastive Learning for Medical Semi-supervised Segmentation [17.69408044083565]
We argue that supervisory information can be directly extracted from the geometry of the feature space. Inspired by the density-based clustering hypothesis, we propose using feature density to locate sparse regions within feature clusters. Our method constructs density-aware neighbor graphs using labeled and unlabeled data samples to estimate feature density and locate sparse regions.
arXiv Detail & Related papers (2024-12-27T13:57:57Z)
Diverse Rare Sample Generation with Pretrained GANs [24.227852798611025]
This study proposes a novel approach for generating diverse rare samples from high-resolution image datasets with pretrained GANs. Our method employs gradient-based optimization of latent vectors within a multi-objective framework and utilizes normalizing flows for density estimation on the feature space. This enables the generation of diverse rare images, with controllable parameters for rarity, diversity, and similarity to a reference image.
arXiv Detail & Related papers (2024-12-27T09:10:30Z)
Learning from Different Samples: A Source-free Framework for Semi-supervised Domain Adaptation [20.172605920901777]
This paper focuses on designing a framework to use different strategies for comprehensively mining different target samples. We propose a novel source-free framework (SOUF) to achieve semi-supervised fine-tuning of the source pre-trained model on the target domain.
arXiv Detail & Related papers (2024-11-11T02:09:32Z)
Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation [57.6797306341115]
We take an initial step towards understanding various matching-based DD methods from the perspective of sample difficulty. We then extend the neural scaling laws of data pruning to DD to theoretically explain these matching-based methods. We introduce the Sample Difficulty Correction (SDC) approach, designed to predominantly generate easier samples to achieve higher dataset quality.
arXiv Detail & Related papers (2024-08-22T15:20:32Z)
Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance. DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator. Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z)
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective [48.99488315273868]
We present a contrastive knowledge distillation approach, which can be formulated as a sample-wise alignment problem with intra- and inter-sample constraints. Our method minimizes logit differences within the same sample by considering their numerical values. We conduct comprehensive experiments on three datasets including CIFAR-100, ImageNet-1K, and MS COCO.
arXiv Detail & Related papers (2024-04-22T11:52:40Z)
Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation [51.66997548477913]
We propose a novel feature-level consistency learning framework named Density-Descending Feature Perturbation (DDFP) Inspired by the low-density separation assumption in semi-supervised learning, our key insight is that feature density can shed a light on the most promising direction for the segmentation classifier to explore. The proposed DDFP outperforms other designs on feature-level perturbations and shows state of the art performances on both Pascal VOC and Cityscapes dataset.
arXiv Detail & Related papers (2024-03-11T06:59:05Z)
Density Matters: Improved Core-set for Active Domain Adaptive Segmentation [35.58476357071544]
Active domain adaptation has emerged as a solution to balance the expensive annotation cost and the performance of trained models in semantic segmentation. In this work, we revisit the theoretical bound of the classical Core-set method and identify that the performance is closely related to the local sample distribution around selected samples. We introduce a local proxy estimator with Dynamic Masked Convolution and develop a Density-aware Greedy algorithm to optimize the bound.
arXiv Detail & Related papers (2023-12-15T08:22:36Z)
Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning [22.410220040736235]
We present a theoretically optimal solution for addressing both coreset selection and active learning. Our proposed method, COPS, is designed to minimize the expected loss of a model trained on subsampled data.
arXiv Detail & Related papers (2023-09-05T14:06:33Z)
Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z)
MuRAL: Multi-Scale Region-based Active Learning for Object Detection [20.478741635006116]
We propose a novel approach called Multi-scale Region-based Active Learning (MuRAL) for object detection. MuRAL identifies informative regions of various scales to reduce annotation costs for well-learned objects. Our proposed method surpasses all existing coarse-grained and fine-grained baselines on Cityscapes and MS COCO datasets.
arXiv Detail & Related papers (2023-03-29T12:52:27Z)
ScatterSample: Diversified Label Sampling for Data Efficient Graph Neural Network Learning [22.278779277115234]
In some applications where graph neural network (GNN) training is expensive, labeling new instances is expensive. We develop a data-efficient active sampling framework, ScatterSample, to train GNNs under an active learning setting. Our experiments on five datasets show that ScatterSample significantly outperforms the other GNN active learning baselines.
arXiv Detail & Related papers (2022-06-09T04:05:02Z)
Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms. We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.