An Automated Data Mining Framework Using Autoencoders for Feature Extraction and Dimensionality Reduction
- URL: http://arxiv.org/abs/2412.02211v1
- Date: Tue, 03 Dec 2024 07:04:10 GMT
- Title: An Automated Data Mining Framework Using Autoencoders for Feature Extraction and Dimensionality Reduction
- Authors: Yaxin Liang, Xinshi Li, Xin Huang, Ziqi Zhang, Yue Yao,
- Abstract summary: This study proposes an automated data mining framework based on autoencoders and experimentally verifies its effectiveness in feature extraction and data dimensionality reduction.
Through the encoding-decoding structure, the autoencoder can capture the data's potential characteristics and achieve noise reduction and anomaly detection.
In the future, with the advancement of deep learning and big data technology, the autoencoder method combined with a generative adversarial network (GAN) or graph neural network (GNN) is expected to be more widely used in the fields of complex data processing, real-time data analysis and intelligent decision-making.
- Score: 10.358417199718462
- License:
- Abstract: This study proposes an automated data mining framework based on autoencoders and experimentally verifies its effectiveness in feature extraction and data dimensionality reduction. Through the encoding-decoding structure, the autoencoder can capture the data's potential characteristics and achieve noise reduction and anomaly detection, providing an efficient and stable solution for the data mining process. The experiment compared the performance of the autoencoder with traditional dimensionality reduction methods (such as PCA, FA, T-SNE, and UMAP). The results showed that the autoencoder performed best in terms of reconstruction error and root mean square error and could better retain data structure and enhance the generalization ability of the model. The autoencoder-based framework not only reduces manual intervention but also significantly improves the automation of data processing. In the future, with the advancement of deep learning and big data technology, the autoencoder method combined with a generative adversarial network (GAN) or graph neural network (GNN) is expected to be more widely used in the fields of complex data processing, real-time data analysis and intelligent decision-making.
Related papers
- Leveraging Semi-Supervised Learning to Enhance Data Mining for Image Classification under Limited Labeled Data [35.431340001608476]
Traditional data mining methods are inadequate when faced with large-scale, high-dimensional and complex data.
This study introduces semi-supervised learning methods, aiming to improve the algorithm's ability to utilize unlabeled data.
Specifically, we adopt a self-training method and combine it with a convolutional neural network (CNN) for image feature extraction and classification.
arXiv Detail & Related papers (2024-11-27T18:59:50Z) - Remote sensing framework for geological mapping via stacked autoencoders and clustering [0.15833270109954137]
We present an unsupervised machine learning-based framework for processing remote sensing data.
We use Landsat 8, ASTER, and Sentinel-2 datasets to evaluate the framework for geological mapping of the Mutawintji region in Australia.
Our results reveal that the framework produces accurate and interpretable geological maps, efficiently discriminating rock units.
arXiv Detail & Related papers (2024-04-02T09:15:32Z) - Defect Classification in Additive Manufacturing Using CNN-Based Vision
Processing [76.72662577101988]
This paper examines two scenarios: first, using convolutional neural networks (CNNs) to accurately classify defects in an image dataset from AM and second, applying active learning techniques to the developed classification model.
This allows the construction of a human-in-the-loop mechanism to reduce the size of the data required to train and generate training data.
arXiv Detail & Related papers (2023-07-14T14:36:58Z) - AutoCure: Automated Tabular Data Curation Technique for ML Pipelines [0.0]
We present AutoCure, a novel and configuration-free data curation pipeline.
Unlike traditional data curation methods, AutoCure synthetically enhances the density of the clean data fraction.
In practice, AutoCure can be integrated with open source tools to promote the democratization of machine learning.
arXiv Detail & Related papers (2023-04-26T15:51:47Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly
detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE)
It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase.
Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z) - Including Sparse Production Knowledge into Variational Autoencoders to
Increase Anomaly Detection Reliability [3.867363075280544]
We study using rarely occurring information about labeled anomalies into Variational Autoencoder neural network structures.
This method outperforms all other models in terms of accuracy, precision, and recall.
arXiv Detail & Related papers (2021-03-24T05:54:12Z) - Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD.
Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z) - Dual Adversarial Auto-Encoders for Clustering [152.84443014554745]
We propose Dual Adversarial Auto-encoder (Dual-AAE) for unsupervised clustering.
By performing variational inference on the objective function of Dual-AAE, we derive a new reconstruction loss which can be optimized by training a pair of Auto-encoders.
Experiments on four benchmarks show that Dual-AAE achieves superior performance over state-of-the-art clustering methods.
arXiv Detail & Related papers (2020-08-23T13:16:34Z) - Anomaly Detection with SDAE [2.9447568514391067]
A Simple, Deep, and Supervised Deep Autoencoder were trained and compared for anomaly detection over the ASHRAE building energy dataset.
The Deep Autoencoder perfoms the best, however the Supervised Deep Autoencoder outperforms the other models in total anomalies detected.
arXiv Detail & Related papers (2020-04-09T07:22:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.