Learning Enhanced Representations for Tabular Data via Neighborhood
Propagation
- URL: http://arxiv.org/abs/2206.06587v1
- Date: Tue, 14 Jun 2022 04:24:52 GMT
- Title: Learning Enhanced Representations for Tabular Data via Neighborhood
Propagation
- Authors: Kounianhua Du, Weinan Zhang, Ruiwen Zhou, Yangkun Wang, Xilong Zhao,
Jiarui Jin, Quan Gan, Zheng Zhang, David Wipf
- Abstract summary: We construct a hypergraph to model the cross-row and cross-column patterns of data instances.
We then perform message propagation to enhance the target data instance representation.
Experiments on two important data prediction tasks validate the superiority of the proposed PET model.
- Score: 24.485479610138498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prediction over tabular data is an essential and fundamental problem in many
important downstream tasks. However, existing methods either take a data
instance of the table independently as input or do not fully utilize the
multi-rows features and labels to directly change and enhance the target data
representations. In this paper, we propose to 1) construct a hypergraph from
relevant data instance retrieval to model the cross-row and cross-column
patterns of those instances, and 2) perform message Propagation to Enhance the
target data instance representation for Tabular prediction tasks. Specifically,
our specially-designed message propagation step benefits from 1) fusion of
label and features during propagation, and 2) locality-aware high-order feature
interactions. Experiments on two important tabular data prediction tasks
validate the superiority of the proposed PET model against other baselines.
Additionally, we demonstrate the effectiveness of the model components and the
feature enhancement ability of PET via various ablation studies and
visualizations. The code is included in https://github.com/KounianhuaDu/PET.
Related papers
- TabDiff: a Multi-Modal Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all multi-modal distributions of tabular data in one model.
Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data.
TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z) - A Survey on Deep Tabular Learning [0.0]
Tabular data presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure.
This survey reviews the evolution of deep learning models for Tabular data, from early fully connected networks (FCNs) to advanced architectures like TabNet, SAINT, TabTranSELU, and MambaNet.
arXiv Detail & Related papers (2024-10-15T20:08:08Z) - Table Transformers for Imputing Textual Attributes [15.823533688884105]
We propose a novel end-to-end approach called Table Transformers for Imputing Textual Attributes (TTITA)
Our approach shows competitive performance outperforming baseline models such as recurrent neural networks and Llama2.
We incorporate multi-task learning to simultaneously impute for heterogeneous columns, boosting the performance for text imputation.
arXiv Detail & Related papers (2024-08-04T19:54:12Z) - Cross-Table Pretraining towards a Universal Function Space for Heterogeneous Tabular Data [35.61663559675556]
Cross-dataset pretraining has shown notable success in various fields.
In this study, we introduce a cross-table pretrained Transformer, XTFormer, for versatile downstream tabular prediction tasks.
Our methodology is pretraining XTFormer to establish a "meta-function" space that encompasses all potential feature-target mappings.
arXiv Detail & Related papers (2024-06-01T03:24:31Z) - Training-Free Generalization on Heterogeneous Tabular Data via
Meta-Representation [67.30538142519067]
We propose Tabular data Pre-Training via Meta-representation (TabPTM)
A deep neural network is then trained to associate these meta-representations with dataset-specific classification confidences.
Experiments validate that TabPTM achieves promising performance in new datasets, even under few-shot scenarios.
arXiv Detail & Related papers (2023-10-31T18:03:54Z) - Large Language Model as Attributed Training Data Generator: A Tale of
Diversity and Bias [92.41919689753051]
Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks.
We investigate training data generation with diversely attributed prompts, which have the potential to yield diverse and attributed generated data.
We show that attributed prompts outperform simple class-conditional prompts in terms of the resulting model's performance.
arXiv Detail & Related papers (2023-06-28T03:31:31Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - GEDI: A Graph-based End-to-end Data Imputation Framework [3.5478302034537705]
The proposed imputation process uses Transformer network and graph structure learning to iteratively refine the contextual relationships among features and similarities among observations.
It uses a meta-learning framework to select features that are influential to the downstream prediction task of interest.
We conduct experiments on real-world large data sets, and show that the proposed imputation process consistently improves imputation and label prediction performance.
arXiv Detail & Related papers (2022-08-13T05:16:40Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - SubTab: Subsetting Features of Tabular Data for Self-Supervised
Representation Learning [5.5616364225463055]
We introduce a new framework, Subsetting features of Tabular data (SubTab)
In this paper, we introduce a new framework, Subsetting features of Tabular data (SubTab)
We argue that reconstructing the data from the subset of its features rather than its corrupted version in an autoencoder setting can better capture its underlying representation.
arXiv Detail & Related papers (2021-10-08T20:11:09Z) - X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented
Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries.
We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP.
We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.