DDFI: Diverse and Distribution-aware Missing Feature Imputation via Two-step Reconstruction
- URL: http://arxiv.org/abs/2512.06356v2
- Date: Thu, 11 Dec 2025 09:53:17 GMT
- Title: DDFI: Diverse and Distribution-aware Missing Feature Imputation via Two-step Reconstruction
- Authors: Yifan Song, Fenglin Yu, Yihong Luo, Xingjian Tao, Siya Qiu, Kai Han, Jing Tang,
- Abstract summary: DDFI is a Diverse and Distribution-aware Missing Feature Imputation method.<n>It combines feature propagation with a graph-based Masked AutoEncoder.<n>It outperforms state-of-the-art methods under both transductive and inductive settings.
- Score: 22.492502807174237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incomplete node features are ubiquitous in real-world scenarios, e.g., the attributes of web users may be partly private, which causes the performance of Graph Neural Networks (GNNs) to decline significantly. Feature propagation (FP) is a well-known method that performs well for imputation of missing node features on graphs, but it still has the following three issues: 1) it struggles with graphs that are not fully connected, 2) imputed features face the over-smoothing problem, and 3) FP is tailored for transductive tasks, overlooking the feature distribution shift in inductive tasks. To address these challenges, we introduce DDFI, a Diverse and Distribution-aware Missing Feature Imputation method that combines feature propagation with a graph-based Masked AutoEncoder (MAE) in a nontrivial manner. It first designs a simple yet effective algorithm, namely Co-Label Linking (CLL), that randomly connects nodes in the training set with the same label to enhance the performance on graphs with numerous connected components. Then we develop a novel two-step representation generation process at the inference stage. Specifically, instead of directly using FP-imputed features as input during inference, DDFI further reconstructs the features through the whole MAE to reduce feature distribution shift in the inductive tasks and enhance the diversity of node features. Meanwhile, since existing feature imputation methods for graphs only evaluate by simulating the missing scenes with manually masking the features, we collect a new dataset called Sailing from the records of voyages that contains naturally missing features to help better evaluate the effectiveness. Extensive experiments conducted on six public datasets and Sailing show that DDFI outperforms the state-of-the-art methods under both transductive and inductive settings.
Related papers
- Optimization-Free Graph Embedding via Distributional Kernel for Community Detection [7.023830532843621]
Neighborhood Aggregation Strategy (NAS) is a widely used approach in graph embedding, underpinning both Graph Neural Networks (GNNs) and Weisfeiler-Lehman (WL) methods.<n>This paper identifies two characteristics in a network, i.e., the distributions of nodes and node degrees that are critical for expressive representation but have been overlooked in existing methods.<n>We propose a novel weighted distribution-aware kernel that embeds nodes while taking their distributional characteristics into consideration.
arXiv Detail & Related papers (2026-02-14T06:56:40Z) - Efficient Identity and Position Graph Embedding via Spectral-Based Random Feature Aggregation [37.25217644507099]
Graph neural networks (GNNs) capture graph structures via a feature aggregation mechanism.<n>It is unclear for most GNN-based methods which property they can capture.<n>We propose random feature aggregation (RFA) for efficient identity and position embedding.
arXiv Detail & Related papers (2025-05-27T10:26:15Z) - Enhancing Missing Data Imputation through Combined Bipartite Graph and Complete Directed Graph [18.06658040186476]
We introduce a novel framework named the Bipartite and Complete Directed Graph Neural Network (BCGNN)
Within BCGNN, observations and features are differentiated as two distinct node types, and the values of observed features are converted into attributed edges linking them.
In parallel, the complete directed graph segment adeptly outlines and communicates the complex interdependencies among features.
arXiv Detail & Related papers (2024-11-07T17:48:37Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Leveraging Graph Diffusion Models for Network Refinement Tasks [72.54590628084178]
We propose a novel graph generative framework, SGDM, based on subgraph diffusion.
Our framework not only improves the scalability and fidelity of graph diffusion models, but also leverages the reverse process to perform novel, conditional generation tasks.
arXiv Detail & Related papers (2023-11-29T18:02:29Z) - NodeFormer: A Scalable Graph Structure Learning Transformer for Node
Classification [70.51126383984555]
We introduce a novel all-pair message passing scheme for efficiently propagating node signals between arbitrary nodes.
The efficient computation is enabled by a kernerlized Gumbel-Softmax operator.
Experiments demonstrate the promising efficacy of the method in various tasks including node classification on graphs.
arXiv Detail & Related papers (2023-06-14T09:21:15Z) - Confidence-Based Feature Imputation for Graphs with Partially Known
Features [11.96118246448543]
We introduce a novel concept of channel-wise confidence in a node feature, which is assigned to each imputed channel feature of a node.
We then design pseudo-confidence using the channel-wise shortest path distance between a missing-feature node and its nearest known-feature node.
Based on the pseudo-confidence, we propose a novel feature imputation scheme that performs channel-wise inter-node diffusion and node-wise inter-channel propagation.
arXiv Detail & Related papers (2023-05-26T04:23:24Z) - Distributed Learning over Networks with Graph-Attention-Based
Personalization [49.90052709285814]
We propose a graph-based personalized algorithm (GATTA) for distributed deep learning.
In particular, the personalized model in each agent is composed of a global part and a node-specific part.
By treating each agent as one node in a graph the node-specific parameters as its features, the benefits of the graph attention mechanism can be inherited.
arXiv Detail & Related papers (2023-05-22T13:48:30Z) - Counterfactual Intervention Feature Transfer for Visible-Infrared Person
Re-identification [69.45543438974963]
We find graph-based methods in the visible-infrared person re-identification task (VI-ReID) suffer from bad generalization because of two issues.
The well-trained input features weaken the learning of graph topology, making it not generalized enough during the inference process.
We propose a Counterfactual Intervention Feature Transfer (CIFT) method to tackle these problems.
arXiv Detail & Related papers (2022-08-01T16:15:31Z) - Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data.
We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way.
We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
arXiv Detail & Related papers (2022-02-13T10:54:59Z) - FDGATII : Fast Dynamic Graph Attention with Initial Residual and
Identity Mapping [0.39373541926236766]
We propose a novel graph neural network FDGATII, inspired by attention mechanism's ability to focus on selective information.
By using sparse dynamic attention, FDG ATII is inherently parallelizable in design, whist efficient in operation.
We show that FDG ATII outperforms GAT and GCN based benchmarks in accuracy and performance on fully supervised tasks.
arXiv Detail & Related papers (2021-10-21T20:19:17Z) - Unveiling Anomalous Edges and Nominal Connectivity of Attributed
Networks [53.56901624204265]
The present work deals with uncovering anomalous edges in attributed graphs using two distinct formulations with complementary strengths.
The first relies on decomposing the graph data matrix into low rank plus sparse components to improve markedly performance.
The second broadens the scope of the first by performing robust recovery of the unperturbed graph, which enhances the anomaly identification performance.
arXiv Detail & Related papers (2021-04-17T20:00:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.