Nonstationary data stream classification with online active learning and
siamese neural networks
- URL: http://arxiv.org/abs/2210.01090v1
- Date: Mon, 3 Oct 2022 17:16:03 GMT
- Title: Nonstationary data stream classification with online active learning and
siamese neural networks
- Authors: Kleanthis Malialis and Christos G. Panayiotou and Marios M. Polycarpou
- Abstract summary: An emerging need for online learning methods that train predictive models on-the-fly.
A series of open challenges, however, hinder their deployment in practice.
We propose the ActiSiamese algorithm, which addresses these challenges by combining online active learning, siamese networks, and a multi-queue memory.
- Score: 11.501721946030779
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We have witnessed in recent years an ever-growing volume of information
becoming available in a streaming manner in various application areas. As a
result, there is an emerging need for online learning methods that train
predictive models on-the-fly. A series of open challenges, however, hinder
their deployment in practice. These are, learning as data arrive in real-time
one-by-one, learning from data with limited ground truth information, learning
from nonstationary data, and learning from severely imbalanced data, while
occupying a limited amount of memory for data storage. We propose the
ActiSiamese algorithm, which addresses these challenges by combining online
active learning, siamese networks, and a multi-queue memory. It develops a new
density-based active learning strategy which considers similarity in the latent
(rather than the input) space. We conduct an extensive study that compares the
role of different active learning budgets and strategies, the performance
with/without memory, the performance with/without ensembling, in both synthetic
and real-world datasets, under different data nonstationarity characteristics
and class imbalance levels. ActiSiamese outperforms baseline and
state-of-the-art algorithms, and is effective under severe imbalance, even only
when a fraction of the arriving instances' labels is available. We publicly
release our code to the community.
Related papers
- CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation [128.00940554196976]
Vision-Language Continual Pretraining (VLCP) has shown impressive results on diverse downstream tasks by offline training on large-scale datasets.
To support the study of Vision-Language Continual Pretraining (VLCP), we first contribute a comprehensive and unified benchmark dataset P9D.
The data from each industry as an independent task supports continual learning and conforms to the real-world long-tail nature to simulate pretraining on web data.
arXiv Detail & Related papers (2023-08-14T13:53:18Z) - Data Quality in Imitation Learning [15.939363481618738]
In offline learning for robotics, we simply lack internet scale data, and so high quality datasets are a necessity.
This is especially true in imitation learning (IL), a sample efficient paradigm for robot learning using expert demonstrations.
In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift.
arXiv Detail & Related papers (2023-06-04T18:48:32Z) - Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability.
We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side.
During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z) - Data augmentation on-the-fly and active learning in data stream
classification [9.367903089535684]
There is an emerging need for predictive models to be trained on-the-fly.
Learning models have access to more labelled data without the need to increase the active learning budget.
Augmented Queues significantly improves the performance in terms of learning quality and speed.
arXiv Detail & Related papers (2022-10-13T09:57:08Z) - Online Continual Learning on a Contaminated Data Stream with Blurry Task
Boundaries [17.43350151320054]
A large body of continual learning (CL) methods assumes data streams with clean labels, and online learning scenarios under noisy data streams are yet underexplored.
We consider a more practical CL task setup of an online learning from blurry data stream with corrupted labels, where existing CL methods struggle.
We propose a novel strategy to manage and use the memory by a unified approach of label noise aware diverse sampling and robust learning with semi-supervised learning.
arXiv Detail & Related papers (2022-03-29T08:52:45Z) - Learning from Heterogeneous Data Based on Social Interactions over
Graphs [58.34060409467834]
This work proposes a decentralized architecture, where individual agents aim at solving a classification problem while observing streaming features of different dimensions.
We show that the.
strategy enables the agents to learn consistently under this highly-heterogeneous setting.
We show that the.
strategy enables the agents to learn consistently under this highly-heterogeneous setting.
arXiv Detail & Related papers (2021-12-17T12:47:18Z) - Online Continual Learning with Natural Distribution Shifts: An Empirical
Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy.
In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online.
We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - Data-efficient Online Classification with Siamese Networks and Active
Learning [11.501721946030779]
We investigate learning from limited labelled, nonstationary and imbalanced data in online classification.
We propose a learning method that synergistically combines siamese neural networks and active learning.
Our study shows that the proposed method is robust to data nonstationarity and imbalance, and significantly outperforms baselines and state-of-the-art algorithms in terms of both learning speed and performance.
arXiv Detail & Related papers (2020-10-04T19:07:19Z) - Bayesian active learning for production, a systematic study and a
reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques.
We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process.
We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.