Multiple-criteria Based Active Learning with Fixed-size Determinantal
Point Processes
- URL: http://arxiv.org/abs/2107.01622v1
- Date: Sun, 4 Jul 2021 13:22:54 GMT
- Title: Multiple-criteria Based Active Learning with Fixed-size Determinantal
Point Processes
- Authors: Xueying Zhan and Qing Li and Antoni B. Chan
- Abstract summary: We introduce a multiple-criteria based active learning algorithm, which incorporates three complementary criteria, i.e., informativeness, representativeness and diversity.
We show that our method performs significantly better and is more stable than other multiple-criteria based AL algorithms.
- Score: 43.71112693633952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Active learning aims to achieve greater accuracy with less training data by
selecting the most useful data samples from which it learns. Single-criterion
based methods (i.e., informativeness and representativeness based methods) are
simple and efficient; however, they lack adaptability to different real-world
scenarios. In this paper, we introduce a multiple-criteria based active
learning algorithm, which incorporates three complementary criteria, i.e.,
informativeness, representativeness and diversity, to make appropriate
selections in the active learning rounds under different data types. We
consider the selection process as a Determinantal Point Process, which good
balance among these criteria. We refine the query selection strategy by both
selecting the hardest unlabeled data sample and biasing towards the classifiers
that are more suitable for the current data distribution. In addition, we also
consider the dependencies and relationships between these data points in data
selection by means of centroidbased clustering approaches. Through evaluations
on synthetic and real-world datasets, we show that our method performs
significantly better and is more stable than other multiple-criteria based AL
algorithms.
Related papers
- Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement [8.509688686402438]
Finetuning large language models on instruction data is crucial for enhancing pre-trained knowledge and improving instruction-following capabilities.
This work addresses the question: How can we determine the optimal subset of data for effective training?
Our method employs k-means clustering to ensure the selected subset effectively represents the full dataset.
arXiv Detail & Related papers (2024-09-17T17:25:31Z) - DsDm: Model-Aware Dataset Selection with Datamodels [81.01744199870043]
Standard practice is to filter for examples that match human notions of data quality.
We find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data.
Our framework avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks.
arXiv Detail & Related papers (2024-01-23T17:22:00Z) - A Contrast Based Feature Selection Algorithm for High-dimensional Data
set in Machine Learning [9.596923373834093]
We propose a novel filter feature selection method, ContrastFS, which selects discriminative features based on the discrepancies features shown between different classes.
We validate effectiveness and efficiency of our approach on several widely studied benchmark datasets, results show that the new method performs favorably with negligible computation.
arXiv Detail & Related papers (2024-01-15T05:32:35Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Exploiting Diversity of Unlabeled Data for Label-Efficient
Semi-Supervised Active Learning [57.436224561482966]
Active learning is a research area that addresses the issues of expensive labeling by selecting the most important samples for labeling.
We introduce a new diversity-based initial dataset selection algorithm to select the most informative set of samples for initial labeling in the active learning setting.
Also, we propose a novel active learning query strategy, which uses diversity-based sampling on consistency-based embeddings.
arXiv Detail & Related papers (2022-07-25T16:11:55Z) - Cost-Effective Online Contextual Model Selection [14.094350329970537]
We formulate this task as an online contextual active model selection problem, where at each round the learner receives an unlabeled data point along with a context.
The goal is to output the best model for any given context without obtaining an excessive amount of labels.
We propose a contextual active model selection algorithm (CAMS), which relies on a novel uncertainty sampling query criterion defined on a given policy class for adaptive model selection.
arXiv Detail & Related papers (2022-07-13T08:22:22Z) - Active metric learning and classification using similarity queries [21.589707834542338]
We show that a novel unified query framework can be applied to any problem in which a key component is learning a representation of the data that reflects similarity.
We demonstrate the effectiveness of the proposed strategy on two tasks -- active metric learning and active classification.
arXiv Detail & Related papers (2022-02-04T03:34:29Z) - Online Active Model Selection for Pre-trained Classifiers [72.84853880948894]
We design an online selective sampling approach that actively selects informative examples to label and outputs the best model with high probability at any round.
Our algorithm can be used for online prediction tasks for both adversarial and streams.
arXiv Detail & Related papers (2020-10-19T19:53:15Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.