Data augmentation on-the-fly and active learning in data stream
classification
- URL: http://arxiv.org/abs/2210.06873v1
- Date: Thu, 13 Oct 2022 09:57:08 GMT
- Title: Data augmentation on-the-fly and active learning in data stream
classification
- Authors: Kleanthis Malialis and Dimitris Papatheodoulou and Stylianos Filippou
and Christos G. Panayiotou and Marios M. Polycarpou
- Abstract summary: There is an emerging need for predictive models to be trained on-the-fly.
Learning models have access to more labelled data without the need to increase the active learning budget.
Augmented Queues significantly improves the performance in terms of learning quality and speed.
- Score: 9.367903089535684
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: There is an emerging need for predictive models to be trained on-the-fly,
since in numerous machine learning applications data are arriving in an online
fashion. A critical challenge encountered is that of limited availability of
ground truth information (e.g., labels in classification tasks) as new data are
observed one-by-one online, while another significant challenge is that of
class imbalance. This work introduces the novel Augmented Queues method, which
addresses the dual-problem by combining in a synergistic manner online active
learning, data augmentation, and a multi-queue memory to maintain separate and
balanced queues for each class. We perform an extensive experimental study
using image and time-series augmentations, in which we examine the roles of the
active learning budget, memory size, imbalance level, and neural network type.
We demonstrate two major advantages of Augmented Queues. First, it does not
reserve additional memory space as the generation of synthetic data occurs only
at training times. Second, learning models have access to more labelled data
without the need to increase the active learning budget and / or the original
memory size. Learning on-the-fly poses major challenges which, typically,
hinder the deployment of learning models. Augmented Queues significantly
improves the performance in terms of learning quality and speed. Our code is
made publicly available.
Related papers
- Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal [54.93261535899478]
In real-world applications, such as robotic control of reinforcement learning, the tasks are changing, and new tasks arise in a sequential order.
This situation poses the new challenge of plasticity-stability trade-off for training an agent who can adapt to task changes and retain acquired knowledge.
We propose a rehearsal-based continual diffusion model, called Continual diffuser (CoD), to endow the diffuser with the capabilities of quick adaptation (plasticity) and lasting retention (stability)
arXiv Detail & Related papers (2024-09-04T08:21:47Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Offline Q-Learning on Diverse Multi-Task Data Both Scales And
Generalizes [100.69714600180895]
offline Q-learning algorithms exhibit strong performance that scales with model capacity.
We train a single policy on 40 games with near-human performance using up-to 80 million parameter networks.
Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal.
arXiv Detail & Related papers (2022-11-28T08:56:42Z) - A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes.
One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z) - Nonstationary data stream classification with online active learning and
siamese neural networks [11.501721946030779]
An emerging need for online learning methods that train predictive models on-the-fly.
A series of open challenges, however, hinder their deployment in practice.
We propose the ActiSiamese algorithm, which addresses these challenges by combining online active learning, siamese networks, and a multi-queue memory.
arXiv Detail & Related papers (2022-10-03T17:16:03Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - Online Continual Learning with Natural Distribution Shifts: An Empirical
Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy.
In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online.
We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z) - Data-efficient Online Classification with Siamese Networks and Active
Learning [11.501721946030779]
We investigate learning from limited labelled, nonstationary and imbalanced data in online classification.
We propose a learning method that synergistically combines siamese neural networks and active learning.
Our study shows that the proposed method is robust to data nonstationarity and imbalance, and significantly outperforms baselines and state-of-the-art algorithms in terms of both learning speed and performance.
arXiv Detail & Related papers (2020-10-04T19:07:19Z) - Continual Prototype Evolution: Learning Online from Non-Stationary Data
Streams [42.525141660788]
We introduce a system to enable learning and prediction at any point in time.
In contrast to the major body of work in continual learning, data streams are processed in an online fashion.
We obtain state-of-the-art performance by a significant margin on eight benchmarks, including three highly imbalanced data streams.
arXiv Detail & Related papers (2020-09-02T09:39:26Z) - A Survey on Self-supervised Pre-training for Sequential Transfer
Learning in Neural Networks [1.1802674324027231]
Self-supervised pre-training for transfer learning is becoming an increasingly popular technique to improve state-of-the-art results using unlabeled data.
We provide an overview of the taxonomy for self-supervised learning and transfer learning, and highlight some prominent methods for designing pre-training tasks across different domains.
arXiv Detail & Related papers (2020-07-01T22:55:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.