Related papers: Model Uncertainty based Active Learning on Tabular Data using Boosted Trees

Model Uncertainty based Active Learning on Tabular Data using Boosted Trees

URL: http://arxiv.org/abs/2310.19573v1
Date: Mon, 30 Oct 2023 14:29:53 GMT
Title: Model Uncertainty based Active Learning on Tabular Data using Boosted Trees
Authors: Sharath M Shankaranarayana
Abstract summary: Supervised machine learning relies on the availability of good labelled data for model training. Active learning is a sub-field of machine learning which helps in obtaining the labelled data efficiently.
Score: 0.4667030429896303
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Supervised machine learning relies on the availability of good labelled data for model training. Labelled data is acquired by human annotation, which is a cumbersome and costly process, often requiring subject matter experts. Active learning is a sub-field of machine learning which helps in obtaining the labelled data efficiently by selecting the most valuable data instances for model training and querying the labels only for those instances from the human annotator. Recently, a lot of research has been done in the field of active learning, especially for deep neural network based models. Although deep learning shines when dealing with image\textual\multimodal data, gradient boosting methods still tend to achieve much better results on tabular data. In this work, we explore active learning for tabular data using boosted trees. Uncertainty based sampling in active learning is the most commonly used querying strategy, wherein the labels of those instances are sequentially queried for which the current model prediction is maximally uncertain. Entropy is often the choice for measuring uncertainty. However, entropy is not exactly a measure of model uncertainty. Although there has been a lot of work in deep learning for measuring model uncertainty and employing it in active learning, it is yet to be explored for non-neural network models. To this end, we explore the effectiveness of boosted trees based model uncertainty methods in active learning. Leveraging this model uncertainty, we propose an uncertainty based sampling in active learning for regression tasks on tabular data. Additionally, we also propose a novel cost-effective active learning method for regression tasks along with an improved cost-effective active learning method for classification tasks.

Related papers

Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning. Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset. We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU) We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z)
Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling [69.60713300418467]
Learning to jump is a general recipe for generative modeling of various types of data. We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better.
arXiv Detail & Related papers (2023-05-28T05:38:28Z)
Frugal Reinforcement-based Active Learning [12.18340575383456]
We propose a novel active learning approach for label-efficient training. The proposed method is iterative and aims at minimizing a constrained objective function that mixes diversity, representativity and uncertainty criteria. We also introduce a novel weighting mechanism based on reinforcement learning, which adaptively balances these criteria at each training iteration.
arXiv Detail & Related papers (2022-12-09T14:17:45Z)
Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability. We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side. During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z)
Practical Active Learning with Model Selection for Small Data [13.128648437690224]
We develop a simple and fast method for practical active learning with model selection. Our method is based on an underlying pool-based active learner for binary classification using support vector classification with a radial basis function kernel.
arXiv Detail & Related papers (2021-12-21T23:11:27Z)
Knowledge-driven Active Learning [70.37119719069499]
Active learning strategies aim at minimizing the amount of labelled data required to train a Deep Learning model. Most active strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary. Here we propose to take into consideration common domain-knowledge and enable non-expert users to train a model with fewer samples.
arXiv Detail & Related papers (2021-10-15T06:11:53Z)
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training. At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z)
Efficacy of Bayesian Neural Networks in Active Learning [11.609770399591516]
We show that Bayesian neural networks are more efficient than ensemble based techniques in capturing uncertainty. Our findings also reveal some key drawbacks of the ensemble techniques, which was recently shown to be more effective than Monte Carlo dropouts.
arXiv Detail & Related papers (2021-04-02T06:02:11Z)
Deep Bayesian Active Learning, A Brief Survey on Recent Advances [6.345523830122166]
Active learning starts training the model with a small size of labeled data. Deep learning methods are not capable of either representing or manipulating model uncertainty. Deep Bayesian active learning frameworks provide practical consideration in the model.
arXiv Detail & Related papers (2020-12-15T02:06:07Z)
Active Learning: Problem Settings and Recent Developments [2.1574781022415364]
This paper explains the basic problem settings of active learning and recent research trends. In particular, research on learning acquisition functions to select samples from the data for labeling, theoretical work on active learning algorithms, and stopping criteria for sequential data acquisition are highlighted.
arXiv Detail & Related papers (2020-12-08T05:24:06Z)
On the Robustness of Active Learning [0.7340017786387767]
Active Learning is concerned with how to identify the most useful samples for a Machine Learning algorithm to be trained with. We find that it is often applied with not enough care and domain knowledge. We propose the new "Sum of Squared Logits" method based on the Simpson diversity index and investigate the effect of using the confusion matrix for balancing in sample selection.
arXiv Detail & Related papers (2020-06-18T09:07:23Z)
Bayesian active learning for production, a systematic study and a reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques. We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process. We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.