GEDI: A Graph-based End-to-end Data Imputation Framework
- URL: http://arxiv.org/abs/2208.06573v2
- Date: Tue, 12 Sep 2023 06:09:54 GMT
- Title: GEDI: A Graph-based End-to-end Data Imputation Framework
- Authors: Katrina Chen, Xiuqin Liang, Zheng Ma, Zhibin Zhang
- Abstract summary: The proposed imputation process uses Transformer network and graph structure learning to iteratively refine the contextual relationships among features and similarities among observations.
It uses a meta-learning framework to select features that are influential to the downstream prediction task of interest.
We conduct experiments on real-world large data sets, and show that the proposed imputation process consistently improves imputation and label prediction performance.
- Score: 3.5478302034537705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data imputation is an effective way to handle missing data, which is common
in practical applications. In this study, we propose and test a novel data
imputation process that achieve two important goals: (1) preserve the row-wise
similarities among observations and column-wise contextual relationships among
features in the feature matrix, and (2) tailor the imputation process to
specific downstream label prediction task. The proposed imputation process uses
Transformer network and graph structure learning to iteratively refine the
contextual relationships among features and similarities among observations.
Moreover, it uses a meta-learning framework to select features that are
influential to the downstream prediction task of interest. We conduct
experiments on real-world large data sets, and show that the proposed
imputation process consistently improves imputation and label prediction
performance over a variety of benchmark methods.
Related papers
- Enhancing Missing Data Imputation through Combined Bipartite Graph and Complete Directed Graph [18.06658040186476]
We introduce a novel framework named the Bipartite and Complete Directed Graph Neural Network (BCGNN)
Within BCGNN, observations and features are differentiated as two distinct node types, and the values of observed features are converted into attributed edges linking them.
In parallel, the complete directed graph segment adeptly outlines and communicates the complex interdependencies among features.
arXiv Detail & Related papers (2024-11-07T17:48:37Z) - From Text to Treatment Effects: A Meta-Learning Approach to Handling Text-Based Confounding [7.5348062792]
This paper examines the performance of meta-learners when confounding variables are expressed in text.
We show that learners using pre-trained text representations of confounders achieve improved CATE estimates.
Due to the entangled nature of the text embeddings, these models do not fully match the performance of meta-learners with perfect confounder knowledge.
arXiv Detail & Related papers (2024-09-23T19:46:19Z) - Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method [23.16359277296206]
We propose a new framework that effectively leverages supervision information to complete missing data in a manner conducive to classification.
Our algorithm significantly outperforms other methods when the data is missing more than 60% of the features.
arXiv Detail & Related papers (2024-05-13T14:44:02Z) - Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence [51.54175067684008]
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks.
We first show that feature aggregation and cost aggregation exhibit distinct characteristics and reveal the potential for substantial benefits stemming from the judicious use of both aggregation processes.
Our framework is evaluated on standard benchmarks for semantic matching, and also applied to geometric matching, where we show that our approach achieves significant improvements compared to existing methods.
arXiv Detail & Related papers (2024-03-17T07:02:55Z) - The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning.
The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned.
Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z) - Deep Reinforcement Learning of Graph Matching [63.469961545293756]
Graph matching (GM) under node and pairwise constraints has been a building block in areas from optimization to computer vision.
We present a reinforcement learning solver for GM i.e. RGM that seeks the node correspondence between pairwise graphs.
Our method differs from the previous deep graph matching model in the sense that they are focused on the front-end feature extraction and affinity function learning.
arXiv Detail & Related papers (2020-12-16T13:48:48Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - Propositionalization and Embeddings: Two Sides of the Same Coin [0.0]
This paper outlines some of the modern data processing techniques used in relational learning.
It focuses on the propositionalization and embedding data transformation approaches.
We present two efficient implementations of the unifying methodology.
arXiv Detail & Related papers (2020-06-08T08:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.