Benchmarking of a new data splitting method on volcanic eruption data
- URL: http://arxiv.org/abs/2410.06306v1
- Date: Tue, 8 Oct 2024 19:29:46 GMT
- Title: Benchmarking of a new data splitting method on volcanic eruption data
- Authors: Simona Reale, Pietro Di Stasio, Francesco Mauro, Alessandro Sebastianelli, Paolo Gamba, Silvia Liberata Ullo,
- Abstract summary: An iterative procedure divides the input dataset of volcanic eruption into two parts using a dissimilarity index calculated on the cumulative histograms of these two parts.
The proposed model achieves the best performance, with a slightly higher number of epochs.
Each model was trained with early stopping, suitable in case of overfitting, and the higher number of epochs in the proposed method demonstrates that early stopping did not detect overfitting.
- Score: 38.85972012552084
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, a novel method for data splitting is presented: an iterative procedure divides the input dataset of volcanic eruption, chosen as the proposed use case, into two parts using a dissimilarity index calculated on the cumulative histograms of these two parts. The Cumulative Histogram Dissimilarity (CHD) index is introduced as part of the design. Based on the obtained results the proposed model in this case, compared to both Random splitting and K-means implemented over different configurations, achieves the best performance, with a slightly higher number of epochs. However, this demonstrates that the model can learn more deeply from the input dataset, which is attributable to the quality of the splitting. In fact, each model was trained with early stopping, suitable in case of overfitting, and the higher number of epochs in the proposed method demonstrates that early stopping did not detect overfitting, and consequently, the learning was optimal.
Related papers
- Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Dataset Distillation Meets Provable Subset Selection [14.158845925610438]
dataset distillation is proposed to compress a large training dataset into a smaller synthetic one that retains its performance.
We present a provable, sampling-based approach for initializing the distilled set by identifying important and removing redundant points in the data.
We further merge the idea of data subset selection with dataset distillation, by training the distilled set on '' sampled points during the training procedure instead of randomly sampling the next batch.
arXiv Detail & Related papers (2023-07-16T15:58:19Z) - Impact of PolSAR pre-processing and balancing methods on complex-valued
neural networks segmentation tasks [9.6556424340252]
We investigate the semantic segmentation of Polarimetric Synthetic Aperture Radar (PolSAR) using Complex-Valued Neural Network (CVNN)
We exhaustively compare both methods for six model architectures, three complex-valued, and their respective real-equivalent models.
We propose two methods for reducing this gap and performing the results for all input representations, models, and dataset pre-processing.
arXiv Detail & Related papers (2022-10-28T12:49:43Z) - Dataset Distillation using Neural Feature Regression [32.53291298089172]
We develop an algorithm for dataset distillation using neural Feature Regression with Pooling (FRePo)
FRePo achieves state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods.
We show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense.
arXiv Detail & Related papers (2022-06-01T19:02:06Z) - Task Affinity with Maximum Bipartite Matching in Few-Shot Learning [28.5184196829547]
We propose an asymmetric affinity score for representing the complexity of utilizing the knowledge of one task for learning another one.
In particular, using this score, we find relevant training data labels to the test data and leverage the discovered relevant data for episodically fine-tuning a few-shot model.
arXiv Detail & Related papers (2021-10-05T23:15:55Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Contrastive Prototype Learning with Augmented Embeddings for Few-Shot
Learning [58.2091760793799]
We propose a novel contrastive prototype learning with augmented embeddings (CPLAE) model.
With a class prototype as an anchor, CPL aims to pull the query samples of the same class closer and those of different classes further away.
Extensive experiments on several benchmarks demonstrate that our proposed CPLAE achieves new state-of-the-art.
arXiv Detail & Related papers (2021-01-23T13:22:44Z) - Evaluating representations by the complexity of learning low-loss
predictors [55.94170724668857]
We consider the problem of evaluating representations of data for use in solving a downstream task.
We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest.
arXiv Detail & Related papers (2020-09-15T22:06:58Z) - A Bayesian Approach with Type-2 Student-tMembership Function for T-S
Model Identification [47.25472624305589]
fuzzyc-regression clustering based on type-2 fuzzyset has been shown the remarkable results on non-sparse data.
Aninnovative architecture for fuzzyc-regression model is presented and a novel student-tdistribution based membership functionis designed for sparse data modelling.
arXiv Detail & Related papers (2020-09-02T05:10:13Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.