IVGAE: Handling Incomplete Heterogeneous Data with a Variational Graph Autoencoder
- URL: http://arxiv.org/abs/2511.22116v1
- Date: Thu, 27 Nov 2025 05:14:50 GMT
- Title: IVGAE: Handling Incomplete Heterogeneous Data with a Variational Graph Autoencoder
- Authors: Youran Zhou, Mohamed Reda Bouadjenek, Sunil Aryal%,
- Abstract summary: We present textbfIVGAE, a Variational Graph Autoencoder framework for robust imputation of incomplete heterogeneous data.<n>IVGAE constructs a bipartite graph to represent sample-feature relationships and applies graph representation learning to model structural dependencies.<n>Experiments on 16 real-world datasets show that IVGAE achieves consistent improvements in RMSE and downstream F1 across MCAR, MAR, and MNAR missing scenarios under 30% missing rates.
- Score: 4.935498694293104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Handling missing data remains a fundamental challenge in real-world tabular datasets, especially when data are heterogeneous with both numerical and categorical features. Existing imputation methods often fail to capture complex structural dependencies and handle heterogeneous data effectively. We present \textbf{IVGAE}, a Variational Graph Autoencoder framework for robust imputation of incomplete heterogeneous data. IVGAE constructs a bipartite graph to represent sample-feature relationships and applies graph representation learning to model structural dependencies. A key innovation is its \textit{dual-decoder architecture}, where one decoder reconstructs feature embeddings and the other models missingness patterns, providing structural priors aware of missing mechanisms. To better encode categorical variables, we introduce a Transformer-based heterogeneous embedding module that avoids high-dimensional one-hot encoding. Extensive experiments on 16 real-world datasets show that IVGAE achieves consistent improvements in RMSE and downstream F1 across MCAR, MAR, and MNAR missing scenarios under 30\% missing rates. Code and data are available at: https://github.com/echoid/IVGAE.
Related papers
- Generative Data Transformation: From Mixed to Unified Data [57.84692191369066]
textscTaesar is a emphdata-centric framework for textbftarget-textbfal textbfregeneration.<n>It encodes cross-domain context into target sequences, enabling standard models to learn intricate dependencies without complex fusion architectures.
arXiv Detail & Related papers (2026-02-26T08:30:09Z) - Heterogeneous Sheaf Neural Networks [17.664754528494132]
Heterogeneous graphs are commonly used to model relational structures in many real-world applications.
We propose using cellular sheaves to model the heterogeneity in the graph's underlying topology.
We introduce HetSheaf, a general framework for heterogeneous sheaf neural networks, and a series of heterogeneous sheaf predictors.
arXiv Detail & Related papers (2024-09-12T13:38:08Z) - An improved tabular data generator with VAE-GMM integration [9.4491536689161]
We propose a novel Variational Autoencoder (VAE)-based model that addresses limitations of current approaches.
Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture.
We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones.
arXiv Detail & Related papers (2024-04-12T12:31:06Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - Fake It Till Make It: Federated Learning with Consensus-Oriented
Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG)
FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training.
Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z) - GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts [75.51612253852002]
GraphMETRO is a Graph Neural Network architecture that models natural diversity and captures complex distributional shifts.
GraphMETRO achieves state-of-the-art results on four datasets from the GOOD benchmark.
arXiv Detail & Related papers (2023-12-07T20:56:07Z) - GrannGAN: Graph annotation generative adversarial networks [72.66289932625742]
We consider the problem of modelling high-dimensional distributions and generating new examples of data with complex relational feature structure coherent with a graph skeleton.
The model we propose tackles the problem of generating the data features constrained by the specific graph structure of each data point by splitting the task into two phases.
In the first it models the distribution of features associated with the nodes of the given graph, in the second it complements the edge features conditionally on the node features.
arXiv Detail & Related papers (2022-12-01T11:49:07Z) - Integrating Transformer and Autoencoder Techniques with Spectral Graph
Algorithms for the Prediction of Scarcely Labeled Molecular Data [2.8360662552057323]
This work introduces three graph-based models incorporating Merriman-Bence-Osher (MBO) techniques to tackle this challenge.
Specifically, graph-based modifications of the MBO scheme is integrated with state-of-the-art techniques, including a home-made transformer and an autoencoder.
The proposed models are validated using five benchmark data sets.
arXiv Detail & Related papers (2022-11-12T22:45:32Z) - EGG-GAE: scalable graph neural networks for tabular data imputation [8.775728170359024]
We propose a novel EdGe Generation Graph AutoEncoder (EGG-GAE) for missing data imputation.
EGG-GAE works on randomly sampled mini-batches of the input data, and it automatically infers the best connectivity across the mini-batch for each architecture layer.
arXiv Detail & Related papers (2022-10-19T10:26:17Z) - Variational Selective Autoencoder: Learning from Partially-Observed
Heterogeneous Data [45.23338389559936]
We propose the variational selective autoencoder (VSAE) to learn representations from partially-observed heterogeneous data.
VSAE learns the latent dependencies in heterogeneous data by modeling the joint distribution of observed data, unobserved data, and the imputation mask.
It results in a unified model for various downstream tasks including data generation and imputation.
arXiv Detail & Related papers (2021-02-25T04:39:13Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.