Combining self-labeling and demand based active learning for
non-stationary data streams
- URL: http://arxiv.org/abs/2302.04141v1
- Date: Wed, 8 Feb 2023 15:38:51 GMT
- Title: Combining self-labeling and demand based active learning for
non-stationary data streams
- Authors: Valerie Vaquet, Fabian Hinder, Johannes Brinkrolf, and Barbara Hammer
- Abstract summary: Learning from non-stationary data streams is a research direction that gains increasing interest as more data in form of streams becomes available.
Most approaches assume that the ground truth of the samples becomes available and perform supervised online learning in the test-then-train scheme.
In this work, we focus on scarcely labeled data streams and explore the potential of self-labeling in gradually drifting data streams.
- Score: 7.951705533903104
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Learning from non-stationary data streams is a research direction that gains
increasing interest as more data in form of streams becomes available, for
example from social media, smartphones, or industrial process monitoring. Most
approaches assume that the ground truth of the samples becomes available
(possibly with some delay) and perform supervised online learning in the
test-then-train scheme. While this assumption might be valid in some scenarios,
it does not apply to all settings. In this work, we focus on scarcely labeled
data streams and explore the potential of self-labeling in gradually drifting
data streams. We formalize this setup and propose a novel online $k$-nn
classifier that combines self-labeling and demand-based active learning.
Related papers
- RPS: A Generic Reservoir Patterns Sampler [1.09784964592609]
We introduce an approach that harnesses a weighted reservoir to facilitate direct pattern sampling from streaming batch data.
We present a generic algorithm capable of addressing temporal biases and handling various pattern types, including sequential, weighted, and unweighted itemsets.
arXiv Detail & Related papers (2024-10-31T16:25:21Z) - Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL.
Our approach learns from passive data by modeling intentions.
Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z) - Active learning for data streams: a survey [0.48951183832371004]
Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream.
Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data.
This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in real time.
arXiv Detail & Related papers (2023-02-17T14:24:13Z) - Reinforced Meta Active Learning [11.913086438671357]
We present an online stream-based meta active learning method which learns on the fly an informativeness measure directly from the data.
The method is based on reinforcement learning and combines episodic policy search and a contextual bandits approach.
We demonstrate on several real datasets that this method learns to select training samples more efficiently than existing state-of-the-art methods.
arXiv Detail & Related papers (2022-03-09T08:36:54Z) - Mining Drifting Data Streams on a Budget: Combining Active Learning with
Self-Labeling [6.436899373275926]
We propose a novel framework for mining drifting data streams on a budget, by combining information coming from active learning and self-labeling.
We introduce several strategies that can take advantage of both intelligent instance selection and semi-supervised procedures, while taking into account the potential presence of concept drift.
arXiv Detail & Related papers (2021-12-21T07:19:35Z) - Online Continual Learning with Natural Distribution Shifts: An Empirical
Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy.
In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online.
We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - Learning to Count in the Crowd from Limited Labeled Data [109.2954525909007]
We focus on reducing the annotation efforts by learning to count in the crowd from limited number of labeled samples.
Specifically, we propose a Gaussian Process-based iterative learning mechanism that involves estimation of pseudo-ground truth for the unlabeled data.
arXiv Detail & Related papers (2020-07-07T04:17:01Z) - A Survey on Self-supervised Pre-training for Sequential Transfer
Learning in Neural Networks [1.1802674324027231]
Self-supervised pre-training for transfer learning is becoming an increasingly popular technique to improve state-of-the-art results using unlabeled data.
We provide an overview of the taxonomy for self-supervised learning and transfer learning, and highlight some prominent methods for designing pre-training tasks across different domains.
arXiv Detail & Related papers (2020-07-01T22:55:48Z) - How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks.
Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.