A Missing Value Filling Model Based on Feature Fusion Enhanced
Autoencoder
- URL: http://arxiv.org/abs/2208.13495v2
- Date: Thu, 3 Aug 2023 05:18:47 GMT
- Title: A Missing Value Filling Model Based on Feature Fusion Enhanced
Autoencoder
- Authors: Xinyao Liu, Shengdong Du, Tianrui Li, Fei Teng and Yan Yang
- Abstract summary: We propose a missing-value-filling model based on a feature-fusion-enhanced autoencoder.
We develop a missing value filling strategy based on dynamic clustering.
The effectiveness of the proposed model is validated by extensive experiments.
- Score: 7.232232177308124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advent of the big data era, the data quality problem is becoming
more critical. Among many factors, data with missing values is one primary
issue, and thus developing effective imputation models is a key topic in the
research community. Recently, a major research direction is to employ neural
network models such as self-organizing mappings or automatic encoders for
filling missing values. However, these classical methods can hardly discover
interrelated features and common features simultaneously among data attributes.
Especially, it is a very typical problem for classical autoencoders that they
often learn invalid constant mappings, which dramatically hurts the filling
performance. To solve the above-mentioned problems, we propose a
missing-value-filling model based on a feature-fusion-enhanced autoencoder. We
first incorporate into an autoencoder a hidden layer that consists of
de-tracking neurons and radial basis function neurons, which can enhance the
ability of learning interrelated features and common features. Besides, we
develop a missing value filling strategy based on dynamic clustering that is
incorporated into an iterative optimization process. This design can enhance
the multi-dimensional feature fusion ability and thus improves the dynamic
collaborative missing-value-filling performance. The effectiveness of the
proposed model is validated by extensive experiments compared to a variety of
baseline methods on thirteen data sets.
Related papers
- Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.
We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues.
We propose a novel approach to address this issue at test time without requiring retraining.
MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z) - Latent variable model for high-dimensional point process with structured missingness [4.451479907610764]
Longitudinal data are important in numerous fields, such as healthcare, sociology and seismology.
Real-world datasets can be high-dimensional, contain structured missingness patterns, and measurement time points can be governed by an unknown process.
We propose a flexible and efficient latent-variable model that is capable of addressing all these limitations.
arXiv Detail & Related papers (2024-02-08T15:41:48Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Key-Exchange Convolutional Auto-Encoder for Data Augmentation in Early
Knee OsteoArthritis Classification [9.400820679110147]
Knee OsteoArthritis (KOA) is a prevalent musculoskeletal condition that impairs the mobility of senior citizens.
We propose a learning model based on the convolutional Auto-Encoder and a hybrid loss strategy to generate new data for early KOA diagnosis.
arXiv Detail & Related papers (2023-02-26T15:45:19Z) - Interpreting Black-box Machine Learning Models for High Dimensional
Datasets [40.09157165704895]
We train a black-box model on a high-dimensional dataset to learn the embeddings on which the classification is performed.
We then approximate the behavior of the black-box model by means of an interpretable surrogate model on the top-k feature space.
Our approach outperforms state-of-the-art methods like TabNet and XGboost when tested on different datasets.
arXiv Detail & Related papers (2022-08-29T07:36:17Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Latent Vector Expansion using Autoencoder for Anomaly Detection [1.370633147306388]
We use the features of the autoencoder to train latent vectors from low to high dimensionality.
We propose a latent vector expansion autoencoder model that improves classification performance at imbalanced data.
arXiv Detail & Related papers (2022-01-05T02:28:38Z) - Learning Causal Models Online [103.87959747047158]
Predictive models can rely on spurious correlations in the data for making predictions.
One solution for achieving strong generalization is to incorporate causal structures in the models.
We propose an online algorithm that continually detects and removes spurious features.
arXiv Detail & Related papers (2020-06-12T20:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.