Employing Two-Dimensional Word Embedding for Difficult Tabular Data Stream Classification
- URL: http://arxiv.org/abs/2404.15836v1
- Date: Wed, 24 Apr 2024 12:14:54 GMT
- Title: Employing Two-Dimensional Word Embedding for Difficult Tabular Data Stream Classification
- Authors: Paweł Zyblewski,
- Abstract summary: This paper proposes Streaming Super Tabular Machine Learning (SSTML) for the difficult data stream classification task.
Experiments conducted on synthetic and real data streams have demonstrated the ability of SSTML to achieve classification quality statistically significantly superior to state-of-the-art algorithms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Rapid technological advances are inherently linked to the increased amount of data, a substantial portion of which can be interpreted as data stream, capable of exhibiting the phenomenon of concept drift and having a high imbalance ratio. Consequently, developing new approaches to classifying difficult data streams is a rapidly growing research area. At the same time, the proliferation of deep learning and transfer learning, as well as the success of convolutional neural networks in computer vision tasks, have contributed to the emergence of a new research trend, namely Multi-Dimensional Encoding (MDE), focusing on transforming tabular data into a homogeneous form of a discrete digital signal. This paper proposes Streaming Super Tabular Machine Learning (SSTML), thereby exploring for the first time the potential of MDE in the difficult data stream classification task. SSTML encodes consecutive data chunks into an image representation using the STML algorithm and then performs a single ResNet-18 training epoch. Experiments conducted on synthetic and real data streams have demonstrated the ability of SSTML to achieve classification quality statistically significantly superior to state-of-the-art algorithms while maintaining comparable processing time.
Related papers
- Sampling-guided Heterogeneous Graph Neural Network with Temporal Smoothing for Scalable Longitudinal Data Imputation [17.81217890585335]
We propose a novel framework, the Sampling-guided Heterogeneous Graph Neural Network (SHT-GNN), to tackle the challenge of missing data imputation.
By leveraging subject-wise mini-batch sampling and a multi-layer temporal smoothing mechanism, SHT-GNN efficiently scales to large datasets.
Experiments on both synthetic and real-world datasets, including the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, demonstrate that SHT-GNN significantly outperforms existing imputation methods.
arXiv Detail & Related papers (2024-11-07T17:41:07Z) - Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - The Devil in the Details: Simple and Effective Optical Flow Synthetic
Data Generation [19.945859289278534]
We show that the required characteristics in an optical flow dataset are rather simple and present a simpler synthetic data generation method.
With 2D motion-based datasets, we systematically analyze the simplest yet critical factors for generating synthetic datasets.
arXiv Detail & Related papers (2023-08-14T18:01:45Z) - CTP: Towards Vision-Language Continual Pretraining via Compatible
Momentum Contrast and Topology Preservation [128.00940554196976]
Vision-Language Continual Pretraining (VLCP) has shown impressive results on diverse downstream tasks by offline training on large-scale datasets.
To support the study of Vision-Language Continual Pretraining (VLCP), we first contribute a comprehensive and unified benchmark dataset P9D.
The data from each industry as an independent task supports continual learning and conforms to the real-world long-tail nature to simulate pretraining on web data.
arXiv Detail & Related papers (2023-08-14T13:53:18Z) - MTS2Graph: Interpretable Multivariate Time Series Classification with
Temporal Evolving Graphs [1.1756822700775666]
We introduce a new framework for interpreting time series data by extracting and clustering the input representative patterns.
We run experiments on eight datasets of the UCR/UEA archive, along with HAR and PAM datasets.
arXiv Detail & Related papers (2023-06-06T16:24:27Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - On the challenges to learn from Natural Data Streams [6.602973237811197]
In real-world contexts, sometimes data are available in form of Natural Data Streams.
This data organization represents an interesting and challenging scenario for both traditional Machine and Deep Learning algorithms.
In this paper, we investigate the classification performance of a variety of algorithms that receive as training input Natural Data Streams.
arXiv Detail & Related papers (2023-01-09T16:32:02Z) - Large Scale Time-Series Representation Learning via Simultaneous Low and
High Frequency Feature Bootstrapping [7.0064929761691745]
We propose a non-contrastive self-supervised learning approach efficiently captures low and high-frequency time-varying features.
Our method takes raw time series data as input and creates two different augmented views for two branches of the model.
To demonstrate the robustness of our model we performed extensive experiments and ablation studies on five real-world time-series datasets.
arXiv Detail & Related papers (2022-04-24T14:39:47Z) - Convolutional generative adversarial imputation networks for
spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - XCM: An Explainable Convolutional Neural Network for Multivariate Time
Series Classification [64.41621835517189]
We present XCM, an eXplainable Convolutional neural network for MTS classification.
XCM is a new compact convolutional neural network which extracts information relative to the observed variables and time directly from the input data.
We first show that XCM outperforms the state-of-the-art MTS classifiers on both the large and small public UEA datasets.
arXiv Detail & Related papers (2020-09-10T11:55:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.