Evaluation for Regression Analyses on Evolving Data Streams
- URL: http://arxiv.org/abs/2502.07213v2
- Date: Wed, 19 Feb 2025 01:03:33 GMT
- Title: Evaluation for Regression Analyses on Evolving Data Streams
- Authors: Yibin Sun, Heitor Murilo Gomes, Bernhard Pfahringer, Albert Bifet,
- Abstract summary: The paper explores the challenges of regression analysis in evolving data streams.<n>We propose a standardized evaluation process for regression and prediction interval tasks in streaming contexts.<n>We introduce an innovative drift simulation strategy capable of synthesizing various drift types.
- Score: 12.679233262168529
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The paper explores the challenges of regression analysis in evolving data streams, an area that remains relatively underexplored compared to classification. We propose a standardized evaluation process for regression and prediction interval tasks in streaming contexts. Additionally, we introduce an innovative drift simulation strategy capable of synthesizing various drift types, including the less-studied incremental drift. Comprehensive experiments with state-of-the-art methods, conducted under the proposed process, validate the effectiveness and robustness of our approach.
Related papers
- Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance.<n>We introduce novel algorithms for dynamic, instance-level data reweighting.<n>Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z) - Unsupervised Assessment of Landscape Shifts Based on Persistent Entropy and Topological Preservation [0.0]
A drift in the input data can have negative consequences on a learning predictor and the system's stability.
In this article, we introduce a novel framework for monitoring changes in multi-dimensional data streams.
The framework operates in both unsupervised and supervised environments.
arXiv Detail & Related papers (2024-10-05T14:57:52Z) - Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - A Conditioned Unsupervised Regression Framework Attuned to the Dynamic Nature of Data Streams [0.0]
This paper presents an optimal strategy for streaming contexts with limited labeled data, introducing an adaptive technique for unsupervised regression.
The proposed method leverages a sparse set of initial labels and introduces an innovative drift detection mechanism.
To enhance adaptability, we integrate the ADWIN (ADaptive WINdowing) algorithm with error generalization based on Root Mean Square Error (RMSE)
arXiv Detail & Related papers (2023-12-12T19:23:54Z) - Boosting Summarization with Normalizing Flows and Aggressive Training [6.6242828769801285]
FlowSUM is a normalizing flows-based variational encoder-decoder framework for Transformer-based summarization.
Our approach tackles two primary challenges in variational summarization: insufficient semantic information in latent representations and posterior collapse during training.
arXiv Detail & Related papers (2023-11-01T15:33:38Z) - TRIAGE: Characterizing and auditing training data for improved
regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors.
TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score.
We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z) - Errors-in-variables Fr\'echet Regression with Low-rank Covariate
Approximation [2.1756081703276]
Fr'echet regression has emerged as a promising approach for regression analysis involving non-Euclidean response variables.
Our proposed framework combines the concepts of global Fr'echet regression and principal component regression, aiming to improve the efficiency and accuracy of the regression estimator.
arXiv Detail & Related papers (2023-05-16T08:37:54Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Vector-Valued Least-Squares Regression under Output Regularity
Assumptions [73.99064151691597]
We propose and analyse a reduced-rank method for solving least-squares regression problems with infinite dimensional output.
We derive learning bounds for our method, and study under which setting statistical performance is improved in comparison to full-rank method.
arXiv Detail & Related papers (2022-11-16T15:07:00Z) - Better Modelling Out-of-Distribution Regression on Distributed Acoustic
Sensor Data Using Anchored Hidden State Mixup [0.7455546102930911]
Generalizing the application of machine learning models to situations where the statistical distribution of training and test data are different has been a complex problem.
We introduce an anchored-based Out of Distribution (OOD) Regression Mixup algorithm, leveraging manifold hidden state mixup and observation similarities to form a novel regularization penalty.
We demonstrate with an extensive evaluation the generalization performance of the proposed method against existing approaches, then show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-02-23T03:12:21Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.