Randomized Spline Trees for Functional Data Classification: Theory and Application to Environmental Time Series
- URL: http://arxiv.org/abs/2409.07879v1
- Date: Thu, 12 Sep 2024 09:38:16 GMT
- Title: Randomized Spline Trees for Functional Data Classification: Theory and Application to Environmental Time Series
- Authors: Donato Riccio, Fabrizio Maturo, Elvira Romano,
- Abstract summary: This paper introduces Randomized Spline Trees (RST), a novel algorithm that bridges the two approaches by incorporating randomized functional representations into the Random Forest framework.
RST generates diverse functional representations of input data using randomized B-spline parameters, creating an ensemble of decision trees trained on these varied representations.
Results show that RST variants outperform standard Random Forests and Gradient Boosting on most datasets, improving classification accuracy by up to 14%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Functional data analysis (FDA) and ensemble learning can be powerful tools for analyzing complex environmental time series. Recent literature has highlighted the key role of diversity in enhancing accuracy and reducing variance in ensemble methods.This paper introduces Randomized Spline Trees (RST), a novel algorithm that bridges these two approaches by incorporating randomized functional representations into the Random Forest framework. RST generates diverse functional representations of input data using randomized B-spline parameters, creating an ensemble of decision trees trained on these varied representations. We provide a theoretical analysis of how this functional diversity contributes to reducing generalization error and present empirical evaluations on six environmental time series classification tasks from the UCR Time Series Archive. Results show that RST variants outperform standard Random Forests and Gradient Boosting on most datasets, improving classification accuracy by up to 14\%. The success of RST demonstrates the potential of adaptive functional representations in capturing complex temporal patterns in environmental data. This work contributes to the growing field of machine learning techniques focused on functional data and opens new avenues for research in environmental time series analysis.
Related papers
- Enriched Functional Tree-Based Classifiers: A Novel Approach Leveraging
Derivatives and Geometric Features [0.0]
This study introduces an advanced methodology for supervised classification by integrating Functional Data Analysis (FDA) with tree-based ensemble techniques for classifying high-dimensional time series.
arXiv Detail & Related papers (2024-09-26T12:57:47Z) - Augmented Functional Random Forests: Classifier Construction and Unbiased Functional Principal Components Importance through Ad-Hoc Conditional Permutations [0.0]
This paper introduces a novel supervised classification strategy that integrates functional data analysis with tree-based methods.
We propose augmented versions of functional classification trees and functional random forests, incorporating a new tool for assessing the importance of functional principal components.
arXiv Detail & Related papers (2024-08-23T15:58:41Z) - Enhancing Variable Importance in Random Forests: A Novel Application of Global Sensitivity Analysis [0.9954382983583578]
The present work provides an application of Global Sensitivity Analysis to supervised machine learning methods such as Random Forests.
Global Sensitivity Analysis is primarily used in mathematical modelling to investigate the effect of the uncertainties of the input variables on the output.
A simulation study shows that our proposal can be used to explore what advances can be achieved either in terms of efficiency, explanatory ability, or simply by way of confirming existing results.
arXiv Detail & Related papers (2024-07-19T10:45:36Z) - Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance.
We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features.
In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z) - Embedded feature selection in LSTM networks with multi-objective
evolutionary ensemble learning for time series forecasting [49.1574468325115]
We present a novel feature selection method embedded in Long Short-Term Memory networks.
Our approach optimize the weights and biases of the LSTM in a partitioned manner.
Experimental evaluations on air quality time series data from Italy and southeast Spain demonstrate that our method substantially improves the ability generalization of conventional LSTMs.
arXiv Detail & Related papers (2023-12-29T08:42:10Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Heterogeneous Multi-Task Gaussian Cox Processes [61.67344039414193]
We present a novel extension of multi-task Gaussian Cox processes for modeling heterogeneous correlated tasks jointly.
A MOGP prior over the parameters of the dedicated likelihoods for classification, regression and point process tasks can facilitate sharing of information between heterogeneous tasks.
We derive a mean-field approximation to realize closed-form iterative updates for estimating model parameters.
arXiv Detail & Related papers (2023-08-29T15:01:01Z) - Large Language Model as Attributed Training Data Generator: A Tale of
Diversity and Bias [92.41919689753051]
Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks.
We investigate training data generation with diversely attributed prompts, which have the potential to yield diverse and attributed generated data.
We show that attributed prompts outperform simple class-conditional prompts in terms of the resulting model's performance.
arXiv Detail & Related papers (2023-06-28T03:31:31Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Interpretable Feature Construction for Time Series Extrinsic Regression [0.028675177318965035]
In some application domains, it occurs that the target variable is numerical and the problem is known as time series extrinsic regression (TSER)
We suggest an extension of a Bayesian method for robust and interpretable feature construction and selection in the context of TSER.
Our approach exploits a relational way to tackle with TSER: (i), we build various and simple representations of the time series which are stored in a relational data scheme, then, (ii), a propositionalisation technique is applied to build interpretable features from secondary tables to "flatten" the data.
arXiv Detail & Related papers (2021-03-15T08:12:19Z) - RENT -- Repeated Elastic Net Technique for Feature Selection [0.46180371154032895]
We present the Repeated Elastic Net Technique (RENT) for Feature Selection.
RENT uses an ensemble of generalized linear models with elastic net regularization, each trained on distinct subsets of the training data.
RENT provides valuable information for model interpretation concerning the identification of objects in the data that are difficult to predict during training.
arXiv Detail & Related papers (2020-09-27T07:55:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.