On the challenges to learn from Natural Data Streams
- URL: http://arxiv.org/abs/2301.03495v1
- Date: Mon, 9 Jan 2023 16:32:02 GMT
- Title: On the challenges to learn from Natural Data Streams
- Authors: Guido Borghi, Gabriele Graffieti and Davide Maltoni
- Abstract summary: In real-world contexts, sometimes data are available in form of Natural Data Streams.
This data organization represents an interesting and challenging scenario for both traditional Machine and Deep Learning algorithms.
In this paper, we investigate the classification performance of a variety of algorithms that receive as training input Natural Data Streams.
- Score: 6.602973237811197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In real-world contexts, sometimes data are available in form of Natural Data
Streams, i.e. data characterized by a streaming nature, unbalanced
distribution, data drift over a long time frame and strong correlation of
samples in short time ranges. Moreover, a clear separation between the
traditional training and deployment phases is usually lacking. This data
organization and fruition represents an interesting and challenging scenario
for both traditional Machine and Deep Learning algorithms and incremental
learning agents, i.e. agents that have the ability to incrementally improve
their knowledge through the past experience. In this paper, we investigate the
classification performance of a variety of algorithms that belong to various
research field, i.e. Continual, Streaming and Online Learning, that receives as
training input Natural Data Streams. The experimental validation is carried out
on three different datasets, expressly organized to replicate this challenging
setting.
Related papers
- RPS: A Generic Reservoir Patterns Sampler [1.09784964592609]
We introduce an approach that harnesses a weighted reservoir to facilitate direct pattern sampling from streaming batch data.
We present a generic algorithm capable of addressing temporal biases and handling various pattern types, including sequential, weighted, and unweighted itemsets.
arXiv Detail & Related papers (2024-10-31T16:25:21Z) - CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation [128.00940554196976]
Vision-Language Continual Pretraining (VLCP) has shown impressive results on diverse downstream tasks by offline training on large-scale datasets.
To support the study of Vision-Language Continual Pretraining (VLCP), we first contribute a comprehensive and unified benchmark dataset P9D.
The data from each industry as an independent task supports continual learning and conforms to the real-world long-tail nature to simulate pretraining on web data.
arXiv Detail & Related papers (2023-08-14T13:53:18Z) - Exploring Data Redundancy in Real-world Image Classification through
Data Selection [20.389636181891515]
Deep learning models often require large amounts of data for training, leading to increased costs.
We present two data valuation metrics based on Synaptic Intelligence and gradient norms, respectively, to study redundancy in real-world image data.
Online and offline data selection algorithms are then proposed via clustering and grouping based on the examined data values.
arXiv Detail & Related papers (2023-06-25T03:31:05Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Learning from Data Streams: An Overview and Update [1.5076964620370268]
We reformulate the fundamental definitions and settings of supervised data-stream learning.
We take a fresh look at what constitutes a supervised data-stream learning task.
Our main emphasis is that learning from data streams does not impose a single-pass or online-learning approach.
arXiv Detail & Related papers (2022-12-30T14:01:41Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Online Continual Learning with Natural Distribution Shifts: An Empirical
Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy.
In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online.
We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z) - Towards Deep Industrial Transfer Learning for Anomaly Detection on Time
Series Data [0.0]
Deep learning promises performant anomaly detection on time-variant datasets.
Deep transfer learning offers mitigation by letting algorithms built upon previous knowledge from different tasks or locations.
arXiv Detail & Related papers (2021-06-09T08:58:56Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - Adaptive Deep Forest for Online Learning from Drifting Data Streams [15.49323098362628]
Learning from data streams is among the most vital fields of contemporary data mining.
We propose Adaptive Deep Forest (ADF) - a natural combination of the successful tree-based streaming classifiers with deep forest.
The conducted experiments show that the deep forest approach can be effectively transformed into an online algorithm.
arXiv Detail & Related papers (2020-10-14T18:24:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.