OEBench: Investigating Open Environment Challenges in Real-World
Relational Data Streams
- URL: http://arxiv.org/abs/2308.15059v3
- Date: Fri, 15 Dec 2023 09:04:01 GMT
- Title: OEBench: Investigating Open Environment Challenges in Real-World
Relational Data Streams
- Authors: Yiqun Diao, Yutong Yang, Qinbin Li, Bingsheng He, Mian Lu
- Abstract summary: We develop an Open Environment Benchmark named OEBench to evaluate open environment challenges in real-world relational data streams.
We find that increased data quantity may not consistently enhance the model accuracy when applied in open environment scenarios.
- Score: 32.898349646434326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: How to get insights from relational data streams in a timely manner is a hot
research topic. Data streams can present unique challenges, such as
distribution drifts, outliers, emerging classes, and changing features, which
have recently been described as open environment challenges for machine
learning. While existing studies have been done on incremental learning for
data streams, their evaluations are mostly conducted with synthetic datasets.
Thus, a natural question is how those open environment challenges look like and
how existing incremental learning algorithms perform on real-world relational
data streams. To fill this gap, we develop an Open Environment Benchmark named
OEBench to evaluate open environment challenges in real-world relational data
streams. Specifically, we investigate 55 real-world relational data streams and
establish that open environment scenarios are indeed widespread, which presents
significant challenges for stream learning algorithms. Through benchmarks with
existing incremental learning algorithms, we find that increased data quantity
may not consistently enhance the model accuracy when applied in open
environment scenarios, where machine learning models can be significantly
compromised by missing values, distribution drifts, or anomalies in real-world
data streams. The current techniques are insufficient in effectively mitigating
these challenges brought by open environments. More researches are needed to
address real-world open environment challenges. All datasets and code are
open-sourced in https://github.com/sjtudyq/OEBench.
Related papers
- Object Detectors in the Open Environment: Challenges, Solutions, and Outlook [95.3317059617271]
The dynamic and intricate nature of the open environment poses novel and formidable challenges to object detectors.
This paper aims to conduct a comprehensive review and analysis of object detectors in open environments.
We propose a framework that includes four quadrants (i.e., out-of-domain, out-of-category, robust learning, and incremental learning) based on the dimensions of the data / target changes.
arXiv Detail & Related papers (2024-03-24T19:32:39Z) - Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with
Online Learning [60.17407932691429]
Open Radio Access Network systems, with their base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability.
We propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments.
We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments.
arXiv Detail & Related papers (2023-09-04T17:30:21Z) - On the challenges to learn from Natural Data Streams [6.602973237811197]
In real-world contexts, sometimes data are available in form of Natural Data Streams.
This data organization represents an interesting and challenging scenario for both traditional Machine and Deep Learning algorithms.
In this paper, we investigate the classification performance of a variety of algorithms that receive as training input Natural Data Streams.
arXiv Detail & Related papers (2023-01-09T16:32:02Z) - Learning from Data Streams: An Overview and Update [1.5076964620370268]
We reformulate the fundamental definitions and settings of supervised data-stream learning.
We take a fresh look at what constitutes a supervised data-stream learning task.
Our main emphasis is that learning from data streams does not impose a single-pass or online-learning approach.
arXiv Detail & Related papers (2022-12-30T14:01:41Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Open Environment Machine Learning [84.90891046882213]
Conventional machine learning studies assume close world scenarios where important factors of the learning process hold invariant.
This article briefly introduces some advances in this line of research, focusing on techniques concerning emerging new classes, decremental/incremental features, changing data distributions, varied learning objectives, and discusses some theoretical issues.
arXiv Detail & Related papers (2022-06-01T11:57:56Z) - ESTemd: A Distributed Processing Framework for Environmental Monitoring
based on Apache Kafka Streaming Engine [0.0]
Distributed networks and real-time systems are becoming the most important components for the new computer age, the Internet of Things.
Data generated offers the ability to measure, infer and understand environmental indicators, from delicate ecologies to natural resources to urban environments.
We propose a distributed framework Event STream Processing Engine for Environmental Monitoring Domain (ESTemd) for the application of stream processing on heterogeneous environmental data.
arXiv Detail & Related papers (2021-04-02T15:04:15Z) - Learning to Continuously Optimize Wireless Resource In Episodically
Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment.
We propose to build the notion of continual learning into the modeling process of learning wireless systems.
Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z) - Challenges in Benchmarking Stream Learning Algorithms with Real-world
Data [2.861782696432711]
Streaming data are increasingly present in real-world applications such as sensor measurements, satellite data feed, stock market, and financial data.
The data stream mining community still faces some primary challenges and difficulties related to the comparison and evaluation of new proposals.
We propose a new public data repository for benchmarking stream algorithms with real-world data.
arXiv Detail & Related papers (2020-04-30T21:31:34Z) - LUNAR: Cellular Automata for Drifting Data Streams [19.98517714325424]
We propose LUNAR, a streamified version of cellular automata.
It is able to act as a real incremental learner while adapting to drifting conditions.
arXiv Detail & Related papers (2020-02-06T09:10:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.