Learning from Data Streams: An Overview and Update
- URL: http://arxiv.org/abs/2212.14720v2
- Date: Thu, 3 Aug 2023 08:18:36 GMT
- Title: Learning from Data Streams: An Overview and Update
- Authors: Jesse Read and Indr\.e \v{Z}liobait\.e
- Abstract summary: We reformulate the fundamental definitions and settings of supervised data-stream learning.
We take a fresh look at what constitutes a supervised data-stream learning task.
Our main emphasis is that learning from data streams does not impose a single-pass or online-learning approach.
- Score: 1.5076964620370268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The literature on machine learning in the context of data streams is vast and
growing. However, many of the defining assumptions regarding data-stream
learning tasks are too strong to hold in practice, or are even contradictory
such that they cannot be met in the contexts of supervised learning. Algorithms
are chosen and designed based on criteria which are often not clearly stated,
for problem settings not clearly defined, tested in unrealistic settings,
and/or in isolation from related approaches in the wider literature. This puts
into question the potential for real-world impact of many approaches conceived
in such contexts, and risks propagating a misguided research focus. We propose
to tackle these issues by reformulating the fundamental definitions and
settings of supervised data-stream learning with regard to contemporary
considerations of concept drift and temporal dependence; and we take a fresh
look at what constitutes a supervised data-stream learning task, and a
reconsideration of algorithms that may be applied to tackle such tasks. Through
and in reflection of this formulation and overview, helped by an informal
survey of industrial players dealing with real-world data streams, we provide
recommendations. Our main emphasis is that learning from data streams does not
impose a single-pass or online-learning approach, or any particular learning
regime; and any constraints on memory and time are not specific to streaming.
Meanwhile, there exist established techniques for dealing with temporal
dependence and concept drift, in other areas of the literature. For the data
streams community, we thus encourage a shift in research focus, from dealing
with often-artificial constraints and assumptions on the learning mode, to
issues such as robustness, privacy, and interpretability which are increasingly
relevant to learning in data streams in academic and industrial settings.
Related papers
- Where is the Truth? The Risk of Getting Confounded in a Continual World [21.862370510786004]
A dataset is confounded if it is most easily solved via a spurious correlation, which fails to generalize to new data.
In a continual learning setting where confounders may vary in time across tasks, the challenge of mitigating the effect of confounders far exceeds the standard forgetting problem.
arXiv Detail & Related papers (2024-02-09T14:24:18Z) - CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation [128.00940554196976]
Vision-Language Continual Pretraining (VLCP) has shown impressive results on diverse downstream tasks by offline training on large-scale datasets.
To support the study of Vision-Language Continual Pretraining (VLCP), we first contribute a comprehensive and unified benchmark dataset P9D.
The data from each industry as an independent task supports continual learning and conforms to the real-world long-tail nature to simulate pretraining on web data.
arXiv Detail & Related papers (2023-08-14T13:53:18Z) - To Compress or Not to Compress- Self-Supervised Learning and Information
Theory: A Review [30.87092042943743]
Deep neural networks excel in supervised learning tasks but are constrained by the need for extensive labeled data.
Self-supervised learning emerges as a promising alternative, allowing models to learn without explicit labels.
Information theory, and notably the information bottleneck principle, has been pivotal in shaping deep neural networks.
arXiv Detail & Related papers (2023-04-19T00:33:59Z) - On the challenges to learn from Natural Data Streams [6.602973237811197]
In real-world contexts, sometimes data are available in form of Natural Data Streams.
This data organization represents an interesting and challenging scenario for both traditional Machine and Deep Learning algorithms.
In this paper, we investigate the classification performance of a variety of algorithms that receive as training input Natural Data Streams.
arXiv Detail & Related papers (2023-01-09T16:32:02Z) - Let Offline RL Flow: Training Conservative Agents in the Latent Space of
Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions.
We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model.
We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z) - Learning from Heterogeneous Data Based on Social Interactions over
Graphs [58.34060409467834]
This work proposes a decentralized architecture, where individual agents aim at solving a classification problem while observing streaming features of different dimensions.
We show that the.
strategy enables the agents to learn consistently under this highly-heterogeneous setting.
We show that the.
strategy enables the agents to learn consistently under this highly-heterogeneous setting.
arXiv Detail & Related papers (2021-12-17T12:47:18Z) - Non-IID data and Continual Learning processes in Federated Learning: A
long road ahead [58.720142291102135]
Federated Learning is a novel framework that allows multiple devices or institutions to train a machine learning model collaboratively while preserving their data private.
In this work, we formally classify data statistical heterogeneity and review the most remarkable learning strategies that are able to face it.
At the same time, we introduce approaches from other machine learning frameworks, such as Continual Learning, that also deal with data heterogeneity and could be easily adapted to the Federated Learning settings.
arXiv Detail & Related papers (2021-11-26T09:57:11Z) - Knowledge-driven Active Learning [70.37119719069499]
Active learning strategies aim at minimizing the amount of labelled data required to train a Deep Learning model.
Most active strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary.
Here we propose to take into consideration common domain-knowledge and enable non-expert users to train a model with fewer samples.
arXiv Detail & Related papers (2021-10-15T06:11:53Z) - Adaptive Explainable Continual Learning Framework for Regression
Problems with Focus on Power Forecasts [0.0]
Two continual learning scenarios will be proposed to describe the potential challenges in this context.
Deep neural networks have to learn new tasks and overcome forgetting the knowledge obtained from the old tasks as the amount of data keeps increasing in applications.
Research topics are related but not limited to developing continual deep learning algorithms, strategies for non-stationarity detection in data streams, explainable and visualizable artificial intelligence, etc.
arXiv Detail & Related papers (2021-08-24T14:59:10Z) - Instance exploitation for learning temporary concepts from sparsely
labeled drifting data streams [15.49323098362628]
Continual learning from streaming data sources becomes more and more popular.
dealing with dynamic and everlasting problems poses new challenges.
One of the most crucial limitations is that we cannot assume having access to a finite and complete data set.
arXiv Detail & Related papers (2020-09-20T08:11:43Z) - Offline Reinforcement Learning: Tutorial, Review, and Perspectives on
Open Problems [108.81683598693539]
offline reinforcement learning algorithms hold tremendous promise for making it possible to turn large datasets into powerful decision making engines.
We will aim to provide the reader with an understanding of these challenges, particularly in the context of modern deep reinforcement learning methods.
arXiv Detail & Related papers (2020-05-04T17:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.