Boundary Matters: A Bi-Level Active Finetuning Framework
- URL: http://arxiv.org/abs/2403.10069v1
- Date: Fri, 15 Mar 2024 07:19:15 GMT
- Title: Boundary Matters: A Bi-Level Active Finetuning Framework
- Authors: Han Lu, Yichen Xie, Xiaokang Yang, Junchi Yan,
- Abstract summary: The concept of active finetuning has emerged, aiming to select the most appropriate samples for model finetuning within a limited budget.
Traditional active learning methods often struggle in this setting due to their inherent bias in batch selection.
We propose a Bi-Level Active Finetuning framework to select the samples for annotation in one shot, which includes two stages: core sample selection for diversity, and boundary sample selection for uncertainty.
- Score: 100.45000039215495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The pretraining-finetuning paradigm has gained widespread adoption in vision tasks and other fields, yet it faces the significant challenge of high sample annotation costs. To mitigate this, the concept of active finetuning has emerged, aiming to select the most appropriate samples for model finetuning within a limited budget. Traditional active learning methods often struggle in this setting due to their inherent bias in batch selection. Furthermore, the recent active finetuning approach has primarily concentrated on aligning the distribution of selected subsets with the overall data pool, focusing solely on diversity. In this paper, we propose a Bi-Level Active Finetuning framework to select the samples for annotation in one shot, which includes two stages: core sample selection for diversity, and boundary sample selection for uncertainty. The process begins with the identification of pseudo-class centers, followed by an innovative denoising method and an iterative strategy for boundary sample selection in the high-dimensional feature space, all without relying on ground-truth labels. Our comprehensive experiments provide both qualitative and quantitative evidence of our method's efficacy, outperforming all the existing baselines.
Related papers
- Hit the Sweet Spot! Span-Level Ensemble for Large Language Models [8.34562564266839]
We propose SweetSpan, a span-level ensemble method that effectively balances the need for real-time adjustments and the information required for accurate ensemble decisions.
Our approach involves two key steps: First, we have each candidate model independently generate candidate spans based on the shared prefix.
Second, we calculate perplexity scores to facilitate mutual evaluation among the candidate models and achieve robust span selection by filtering out unfaithful scores.
arXiv Detail & Related papers (2024-09-27T09:41:29Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - ActiveDC: Distribution Calibration for Active Finetuning [36.64444238742072]
We propose a new method called ActiveDC for the active finetuning tasks.
We calibrate the distribution of the selected samples by exploiting implicit category information in the unlabeled pool.
The results indicate that ActiveDC consistently outperforms the baseline performance in all image classification tasks.
arXiv Detail & Related papers (2023-11-13T14:35:18Z) - Regularizing Second-Order Influences for Continual Learning [39.16131410356833]
Continual learning aims to learn on non-stationary data streams without catastrophically forgetting previous knowledge.
Prevalent replay-based methods address this challenge by rehearsing on a small buffer holding the seen data.
We dissect the interaction of sequential selection steps within a framework built on influence functions.
arXiv Detail & Related papers (2023-04-20T09:30:35Z) - Active Finetuning: Exploiting Annotation Budget in the
Pretraining-Finetuning Paradigm [132.9949120482274]
This paper focuses on the selection of samples for annotation in the pretraining-finetuning paradigm.
We propose a novel method called ActiveFT for active finetuning task to select a subset of data distributing similarly with the entire unlabeled pool.
Extensive experiments show the leading performance and high efficiency of ActiveFT superior to baselines on both image classification and semantic segmentation.
arXiv Detail & Related papers (2023-03-25T07:17:03Z) - Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy.
We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples.
Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.