Related papers: GEDI: A Graph-based End-to-end Data Imputation Framework

GEDI: A Graph-based End-to-end Data Imputation Framework

URL: http://arxiv.org/abs/2208.06573v2
Date: Tue, 12 Sep 2023 06:09:54 GMT
Title: GEDI: A Graph-based End-to-end Data Imputation Framework
Authors: Katrina Chen, Xiuqin Liang, Zheng Ma, Zhibin Zhang
Abstract summary: The proposed imputation process uses Transformer network and graph structure learning to iteratively refine the contextual relationships among features and similarities among observations. It uses a meta-learning framework to select features that are influential to the downstream prediction task of interest. We conduct experiments on real-world large data sets, and show that the proposed imputation process consistently improves imputation and label prediction performance.
Score: 3.5478302034537705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data imputation is an effective way to handle missing data, which is common in practical applications. In this study, we propose and test a novel data imputation process that achieve two important goals: (1) preserve the row-wise similarities among observations and column-wise contextual relationships among features in the feature matrix, and (2) tailor the imputation process to specific downstream label prediction task. The proposed imputation process uses Transformer network and graph structure learning to iteratively refine the contextual relationships among features and similarities among observations. Moreover, it uses a meta-learning framework to select features that are influential to the downstream prediction task of interest. We conduct experiments on real-world large data sets, and show that the proposed imputation process consistently improves imputation and label prediction performance over a variety of benchmark methods.

Related papers

Learning Binarized Representations with Pseudo-positive Sample Enhancement for Efficient Graph Collaborative Filtering [35.82405808653398]
We study the problem of graph representation binarization for efficient collaborative filtering.<n>Our findings indicate that explicitly mitigating information loss at various stages of embedding binarization has a significant positive impact on performance.<n>Compared to its predecessor BiGeaR, BiGeaR++ introduces a fine-grained inference distillation mechanism and an effective embedding sample synthesis approach.
arXiv Detail & Related papers (2025-06-03T11:11:43Z)
Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training. We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO. As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z)
Enhancing Missing Data Imputation through Combined Bipartite Graph and Complete Directed Graph [18.06658040186476]
We introduce a novel framework named the Bipartite and Complete Directed Graph Neural Network (BCGNN) Within BCGNN, observations and features are differentiated as two distinct node types, and the values of observed features are converted into attributed edges linking them. In parallel, the complete directed graph segment adeptly outlines and communicates the complex interdependencies among features.
arXiv Detail & Related papers (2024-11-07T17:48:37Z)
From Text to Treatment Effects: A Meta-Learning Approach to Handling Text-Based Confounding [7.5348062792]
This paper examines the performance of meta-learners when confounding variables are expressed in text. We show that learners using pre-trained text representations of confounders achieve improved CATE estimates. Due to the entangled nature of the text embeddings, these models do not fully match the performance of meta-learners with perfect confounder knowledge.
arXiv Detail & Related papers (2024-09-23T19:46:19Z)
Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method [23.16359277296206]
We propose a new framework that effectively leverages supervision information to complete missing data in a manner conducive to classification. Our algorithm significantly outperforms other methods when the data is missing more than 60% of the features.
arXiv Detail & Related papers (2024-05-13T14:44:02Z)
Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence [51.54175067684008]
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks. We first show that feature aggregation and cost aggregation exhibit distinct characteristics and reveal the potential for substantial benefits stemming from the judicious use of both aggregation processes. Our framework is evaluated on standard benchmarks for semantic matching, and also applied to geometric matching, where we show that our approach achieves significant improvements compared to existing methods.
arXiv Detail & Related papers (2024-03-17T07:02:55Z)
The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously. We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)
Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning. The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned. Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z)
Deep Reinforcement Learning of Graph Matching [63.469961545293756]
Graph matching (GM) under node and pairwise constraints has been a building block in areas from optimization to computer vision. We present a reinforcement learning solver for GM i.e. RGM that seeks the node correspondence between pairwise graphs. Our method differs from the previous deep graph matching model in the sense that they are focused on the front-end feature extraction and affinity function learning.
arXiv Detail & Related papers (2020-12-16T13:48:48Z)
Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks. We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z)
Propositionalization and Embeddings: Two Sides of the Same Coin [0.0]
This paper outlines some of the modern data processing techniques used in relational learning. It focuses on the propositionalization and embedding data transformation approaches. We present two efficient implementations of the unifying methodology.
arXiv Detail & Related papers (2020-06-08T08:33:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.