Mining Drifting Data Streams on a Budget: Combining Active Learning with
Self-Labeling
- URL: http://arxiv.org/abs/2112.11019v1
- Date: Tue, 21 Dec 2021 07:19:35 GMT
- Title: Mining Drifting Data Streams on a Budget: Combining Active Learning with
Self-Labeling
- Authors: {\L}ukasz Korycki, Bartosz Krawczyk
- Abstract summary: We propose a novel framework for mining drifting data streams on a budget, by combining information coming from active learning and self-labeling.
We introduce several strategies that can take advantage of both intelligent instance selection and semi-supervised procedures, while taking into account the potential presence of concept drift.
- Score: 6.436899373275926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mining data streams poses a number of challenges, including the continuous
and non-stationary nature of data, the massive volume of information to be
processed and constraints put on the computational resources. While there is a
number of supervised solutions proposed for this problem in the literature,
most of them assume that access to the ground truth (in form of class labels)
is unlimited and such information can be instantly utilized when updating the
learning system. This is far from being realistic, as one must consider the
underlying cost of acquiring labels. Therefore, solutions that can reduce the
requirements for ground truth in streaming scenarios are required. In this
paper, we propose a novel framework for mining drifting data streams on a
budget, by combining information coming from active learning and self-labeling.
We introduce several strategies that can take advantage of both intelligent
instance selection and semi-supervised procedures, while taking into account
the potential presence of concept drift. Such a hybrid approach allows for
efficient exploration and exploitation of streaming data structures within
realistic labeling budgets. Since our framework works as a wrapper, it may be
applied with different learning algorithms. Experimental study, carried out on
a diverse set of real-world data streams with various types of concept drift,
proves the usefulness of the proposed strategies when dealing with highly
limited access to class labels. The presented hybrid approach is especially
feasible when one cannot increase a budget for labeling or replace an
inefficient classifier. We deliver a set of recommendations regarding areas of
applicability for our strategies.
Related papers
- Active learning for data streams: a survey [0.48951183832371004]
Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream.
Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data.
This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in real time.
arXiv Detail & Related papers (2023-02-17T14:24:13Z) - Combining self-labeling and demand based active learning for
non-stationary data streams [7.951705533903104]
Learning from non-stationary data streams is a research direction that gains increasing interest as more data in form of streams becomes available.
Most approaches assume that the ground truth of the samples becomes available and perform supervised online learning in the test-then-train scheme.
In this work, we focus on scarcely labeled data streams and explore the potential of self-labeling in gradually drifting data streams.
arXiv Detail & Related papers (2023-02-08T15:38:51Z) - Nonstationary data stream classification with online active learning and
siamese neural networks [11.501721946030779]
An emerging need for online learning methods that train predictive models on-the-fly.
A series of open challenges, however, hinder their deployment in practice.
We propose the ActiSiamese algorithm, which addresses these challenges by combining online active learning, siamese networks, and a multi-queue memory.
arXiv Detail & Related papers (2022-10-03T17:16:03Z) - An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning [58.59343434538218]
We propose a simple but quite effective approach to predict accurate negative pseudo-labels of unlabeled data from an indirect learning perspective.
Our approach can be implemented in just few lines of code by only using off-the-shelf operations.
arXiv Detail & Related papers (2022-09-28T02:11:34Z) - Deep Active Learning with Budget Annotation [0.0]
We propose a hybrid approach of computing both the uncertainty and informativeness of an instance.
We employ the state-of-the-art pre-trained models in order to avoid querying information already contained in those models.
arXiv Detail & Related papers (2022-07-31T20:20:44Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised
Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution.
Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences.
We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z) - Budget-aware Few-shot Learning via Graph Convolutional Network [56.41899553037247]
This paper tackles the problem of few-shot learning, which aims to learn new visual concepts from a few examples.
A common problem setting in few-shot classification assumes random sampling strategy in acquiring data labels.
We introduce a new budget-aware few-shot learning problem that aims to learn novel object categories.
arXiv Detail & Related papers (2022-01-07T02:46:35Z) - Just Label What You Need: Fine-Grained Active Selection for Perception
and Prediction through Partially Labeled Scenes [78.23907801786827]
We introduce generalizations that ensure that our approach is both cost-aware and allows for fine-grained selection of examples through partially labeled scenes.
Our experiments on a real-world, large-scale self-driving dataset suggest that fine-grained selection can improve the performance across perception, prediction, and downstream planning tasks.
arXiv Detail & Related papers (2021-04-08T17:57:41Z) - Instance exploitation for learning temporary concepts from sparsely
labeled drifting data streams [15.49323098362628]
Continual learning from streaming data sources becomes more and more popular.
dealing with dynamic and everlasting problems poses new challenges.
One of the most crucial limitations is that we cannot assume having access to a finite and complete data set.
arXiv Detail & Related papers (2020-09-20T08:11:43Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.