An Empirical Evaluation of the t-SNE Algorithm for Data Visualization in
Structural Engineering
- URL: http://arxiv.org/abs/2109.08795v1
- Date: Sat, 18 Sep 2021 01:24:39 GMT
- Title: An Empirical Evaluation of the t-SNE Algorithm for Data Visualization in
Structural Engineering
- Authors: Parisa Hajibabaee, Farhad Pourkamali-Anaraki, Mohammad Amin
Hariri-Ardebili
- Abstract summary: t-Distributed Neighbor Embedding (t-SNE) algorithm is used to reduce the dimensions of an earthquake related data set for visualization purposes.
Synthetic Minority Oversampling Technique (SMOTE) is used to tackle the imbalanced nature of such data set.
We show that using t-SNE on the imbalanced data and SMOTE on the training data set, neural network classifiers have promising results without sacrificing accuracy.
- Score: 2.4493299476776773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A fundamental task in machine learning involves visualizing high-dimensional
data sets that arise in high-impact application domains. When considering the
context of large imbalanced data, this problem becomes much more challenging.
In this paper, the t-Distributed Stochastic Neighbor Embedding (t-SNE)
algorithm is used to reduce the dimensions of an earthquake engineering related
data set for visualization purposes. Since imbalanced data sets greatly affect
the accuracy of classifiers, we employ Synthetic Minority Oversampling
Technique (SMOTE) to tackle the imbalanced nature of such data set. We present
the result obtained from t-SNE and SMOTE and compare it to the basic approaches
with various aspects. Considering four options and six classification
algorithms, we show that using t-SNE on the imbalanced data and SMOTE on the
training data set, neural network classifiers have promising results without
sacrificing accuracy. Hence, we can transform the studied scientific data into
a two-dimensional (2D) space, enabling the visualization of the classifier and
the resulting decision surface using a 2D plot.
Related papers
- Two-Stage Hierarchical and Explainable Feature Selection Framework for Dimensionality Reduction in Sleep Staging [0.6216545676226375]
EEG signals play a significant role in sleep research.
Due to the high-dimensional nature of EEG signal data sequences, data visualization and clustering of different sleep stages have been challenges.
We propose a two-stage hierarchical and explainable feature selection framework by incorporating a feature selection algorithm to improve the performance of dimensionality reduction.
arXiv Detail & Related papers (2024-08-31T23:54:53Z) - Few-shot learning for COVID-19 Chest X-Ray Classification with
Imbalanced Data: An Inter vs. Intra Domain Study [49.5374512525016]
Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research.
Some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images.
We propose a methodology based on Siamese neural networks in which a series of techniques are integrated to mitigate the effects of data scarcity and distribution imbalance.
arXiv Detail & Related papers (2024-01-18T16:59:27Z) - On the Convergence of Loss and Uncertainty-based Active Learning Algorithms [3.506897386829711]
We investigate the convergence rates and data sample sizes required for training a machine learning model using a gradient descent (SGD) algorithm.
We present convergence results for linear classifiers and linearly separable datasets using squared hinge loss and similar training loss functions.
arXiv Detail & Related papers (2023-12-21T15:22:07Z) - Defect Classification in Additive Manufacturing Using CNN-Based Vision
Processing [76.72662577101988]
This paper examines two scenarios: first, using convolutional neural networks (CNNs) to accurately classify defects in an image dataset from AM and second, applying active learning techniques to the developed classification model.
This allows the construction of a human-in-the-loop mechanism to reduce the size of the data required to train and generate training data.
arXiv Detail & Related papers (2023-07-14T14:36:58Z) - Machine Learning Based Missing Values Imputation in Categorical Datasets [2.5611256859404983]
This research looked into the use of machine learning algorithms to fill in the gaps in categorical datasets.
The emphasis was on ensemble models constructed using the Error Correction Output Codes framework.
Deep learning for missing data imputation has obstacles despite these encouraging results, including the requirement for large amounts of labeled data.
arXiv Detail & Related papers (2023-06-10T03:29:48Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Data Scaling Laws in NMT: The Effect of Noise and Architecture [59.767899982937756]
We study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT)
We find that the data scaling exponents are minimally impacted, suggesting that marginally worse architectures or training data can be compensated for by adding more data.
arXiv Detail & Related papers (2022-02-04T06:53:49Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Prediction of Object Geometry from Acoustic Scattering Using
Convolutional Neural Networks [8.067201256886733]
The present work proposes to infer object geometry from scattering features by training convolutional neural networks.
The robustness of our approach in response to data degradation is evaluated by comparing the performance of networks trained using the datasets.
arXiv Detail & Related papers (2020-10-21T00:51:14Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z) - Performance Analysis of Semi-supervised Learning in the Small-data
Regime using VAEs [0.261072980439312]
In this work, we applied an existing algorithm that pre-trains a latent space representation of the data to capture the features in a lower-dimension for the small-data regime input.
The fine-tuned latent space provides constant weights that are useful for classification.
Here we will present the performance analysis of the VAE algorithm with different latent space sizes in the semi-supervised learning.
arXiv Detail & Related papers (2020-02-26T16:19:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.