Related papers: Object-Focused Data Selection for Dense Prediction Tasks

Related papers

A Critical Look at Targeted Instruction Selection: Disentangling What Matters (and What Doesn't) [14.070675074621043]
Instruction fine-tuning involves selecting a subset of instruction training data from a large candidate pool, using a small query set from the target task.<n>Despite growing interest, the literature on targeted instruction selection remains fragmented and opaque.<n>In this work, we aim to bring clarity to this landscape by disentangling and systematically analyzing the two core ingredients: data representation and selection algorithms.
arXiv Detail & Related papers (2026-02-16T12:33:05Z)
Box-Level Class-Balanced Sampling for Active Object Detection [34.79955979395035]
Active learning (AL) is a promising technique to alleviate the annotation burden.<n> Performing AL at box-level for object detection has been shown to be more cost-effective than selecting and labelling the entire image.<n>We propose a class-balanced sampling strategy to select more objects from minority classes for labelling.
arXiv Detail & Related papers (2025-08-25T09:57:22Z)
OpenPath: Open-Set Active Learning for Pathology Image Classification via Pre-trained Vision-Language Models [22.494367900953645]
We propose OpenPath, a novel open-set active learning approach for pathological image classification.<n>OpenPath significantly enhances the model's performance due to its high purity of selected samples.
arXiv Detail & Related papers (2025-06-18T09:47:45Z)
Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm [50.492124556982674]
This paper introduces a novel choice-based sample selection framework.<n>It shifts the focus from evaluating individual sample quality to comparing the contribution value of different samples.<n>We validate our approach on a larger medical dataset, highlighting its practical applicability in real-world applications.
arXiv Detail & Related papers (2025-03-04T07:32:41Z)
Language Model-Driven Data Pruning Enables Efficient Active Learning [6.816044132563518]
We introduce a plug-and-play unlabeled data pruning strategy, ActivePrune, to prune the unlabeled pool. To enhance the diversity in the unlabeled pool, we propose a novel perplexity reweighting method. Experiments on translation, sentiment analysis, topic classification, and summarization tasks demonstrate that ActivePrune outperforms existing data pruning methods.
arXiv Detail & Related papers (2024-10-05T19:46:11Z)
Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection [26.486535389258965]
We experimentally find three gaps between general and oriented object detection in semi-supervised learning. We propose a Multi-clue Consistency Learning (MCL) framework to bridge these gaps. Our proposed MCL can achieve state-of-the-art performance in the semi-supervised oriented object detection task.
arXiv Detail & Related papers (2024-07-08T13:14:25Z)
Diverse Subset Selection via Norm-Based Sampling and Orthogonality [31.558151874765667]
Large annotated datasets are crucial for the success of deep neural networks, but labeling data can be prohibitively expensive in domains such as medical imaging.<n>This work tackles the subset selection problem: selecting a small set of the most informative examples from a large unlabeled pool for annotation.
arXiv Detail & Related papers (2024-06-03T08:12:32Z)
DsDm: Model-Aware Dataset Selection with Datamodels [81.01744199870043]
Standard practice is to filter for examples that match human notions of data quality. We find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data. Our framework avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks.
arXiv Detail & Related papers (2024-01-23T17:22:00Z)
Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task. We propose a co-training-based framework that encourages clustering consistency. Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z)
Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets. Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly. FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z)
Two-Step Active Learning for Instance Segmentation with Uncertainty and Diversity Sampling [20.982992381790034]
We propose a post-hoc active learning algorithm that integrates uncertainty-based sampling with diversity-based sampling. Our proposed algorithm is not only simple and easy to implement, but it also delivers superior performance on various datasets.
arXiv Detail & Related papers (2023-09-28T03:40:30Z)
DiffusAL: Coupling Active Learning with Graph Diffusion for Label-Efficient Node Classification [1.0602247913671219]
We introduce a novel active graph learning approach called DiffusAL, showing significant robustness in diverse settings. Most of our calculations for acquisition and training can be pre-processed, making DiffusAL more efficient compared to approaches combining diverse selection criteria. Our experiments on various benchmark datasets show that, unlike previous methods, our approach significantly outperforms random selection in 100% of all datasets and labeling budgets tested.
arXiv Detail & Related papers (2023-07-31T20:30:13Z)
ISLE: A Framework for Image Level Semantic Segmentation Ensemble [5.137284292672375]
Conventional semantic segmentation networks require massive pixel-wise annotated labels to reach state-of-the-art prediction quality. We propose ISLE, which employs an ensemble of the "pseudo-labels" for a given set of different semantic segmentation techniques on a class-wise level. We reach up to 2.4% improvement over ISLE's individual components.
arXiv Detail & Related papers (2023-03-14T13:36:36Z)
Exploiting Diversity of Unlabeled Data for Label-Efficient Semi-Supervised Active Learning [57.436224561482966]
Active learning is a research area that addresses the issues of expensive labeling by selecting the most important samples for labeling. We introduce a new diversity-based initial dataset selection algorithm to select the most informative set of samples for initial labeling in the active learning setting. Also, we propose a novel active learning query strategy, which uses diversity-based sampling on consistency-based embeddings.
arXiv Detail & Related papers (2022-07-25T16:11:55Z)
Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS) It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes. In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image. We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z)
Uncertainty-Aware Semi-Supervised Few Shot Segmentation [9.098329723771116]
Few shot segmentation (FSS) aims to learn pixel-level classification of a target object in a query image using only a few annotated support samples. This is challenging as it requires modeling appearance variations of target objects and the diverse visual cues between query and support images with limited information. We propose a semi-supervised FSS strategy that leverages additional prototypes from unlabeled images with uncertainty guided pseudo label refinement.
arXiv Detail & Related papers (2021-10-18T00:37:46Z)
S$^3$VAADA: Submodular Subset Selection for Virtual Adversarial Active Domain Adaptation [49.01925978752677]
In the real-world scenario's it might be feasible to get labels for a small proportion of target data. We propose S$3$VAADA which i) introduces a novel submodular criterion to select a maximally informative subset to label and ii) enhances a cluster-based DA procedure. Our approach consistently outperforms the competing state-of-the-art approaches on datasets with varying degrees of domain shifts.
arXiv Detail & Related papers (2021-09-18T10:53:57Z)
Region-level Active Learning for Cluttered Scenes [60.93811392293329]
We introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach. We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes.
arXiv Detail & Related papers (2021-08-20T14:02:38Z)
Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
A Few-Shot Sequential Approach for Object Counting [63.82757025821265]
We introduce a class attention mechanism that sequentially attends to objects in the image and extracts their relevant features. The proposed technique is trained on point-level annotations and uses a novel loss function that disentangles class-dependent and class-agnostic aspects of the model. We present our results on a variety of object-counting/detection datasets, including FSOD and MS COCO.
arXiv Detail & Related papers (2020-07-03T18:23:39Z)
Large-Scale Object Detection in the Wild from Imbalanced Multi-Labels [128.77822070156057]
In this work, we quantitatively analyze label problems that objects may explicitly or implicitly have multiple labels. We propose a soft-sampling methods with hybrid training scheduler to deal with the label imbalance. Our method yields a dramatic improvement of 3.34 points, leading to the best single model with 60.90 mAP on the public object detection test set of Open Images.
arXiv Detail & Related papers (2020-05-18T04:36:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.