Related papers: Learning Enhanced Representations for Tabular Data via Neighborhood Propagation

Learning Enhanced Representations for Tabular Data via Neighborhood Propagation

URL: http://arxiv.org/abs/2206.06587v1
Date: Tue, 14 Jun 2022 04:24:52 GMT
Title: Learning Enhanced Representations for Tabular Data via Neighborhood Propagation
Authors: Kounianhua Du, Weinan Zhang, Ruiwen Zhou, Yangkun Wang, Xilong Zhao, Jiarui Jin, Quan Gan, Zheng Zhang, David Wipf
Abstract summary: We construct a hypergraph to model the cross-row and cross-column patterns of data instances. We then perform message propagation to enhance the target data instance representation. Experiments on two important data prediction tasks validate the superiority of the proposed PET model.
Score: 24.485479610138498
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prediction over tabular data is an essential and fundamental problem in many important downstream tasks. However, existing methods either take a data instance of the table independently as input or do not fully utilize the multi-rows features and labels to directly change and enhance the target data representations. In this paper, we propose to 1) construct a hypergraph from relevant data instance retrieval to model the cross-row and cross-column patterns of those instances, and 2) perform message Propagation to Enhance the target data instance representation for Tabular prediction tasks. Specifically, our specially-designed message propagation step benefits from 1) fusion of label and features during propagation, and 2) locality-aware high-order feature interactions. Experiments on two important tabular data prediction tasks validate the superiority of the proposed PET model against other baselines. Additionally, we demonstrate the effectiveness of the model components and the feature enhancement ability of PET via various ablation studies and visualizations. The code is included in https://github.com/KounianhuaDu/PET.

Related papers

TabRep: Training Tabular Diffusion Models with a Simple and Effective Continuous Representation [16.907006955584343]
Diffusion models have been the predominant generative model for data generation. We present TabRep, a training architecture trained with a unified continuous representation. Our results showcase that TabRep achieves superior performance across a broad suite of evaluations.
arXiv Detail & Related papers (2025-04-07T07:44:27Z)
A Closer Look at TabPFN v2: Strength, Limitation, and Extension [51.08999772842298]
Tabular Prior-data Fitted Network v2 (TabPFN v2) achieves unprecedented in-context learning accuracy across multiple datasets. In this paper, we evaluate TabPFN v2 on over 300 datasets, confirming its exceptional generalization capabilities on small- to medium-scale tasks.
arXiv Detail & Related papers (2025-02-24T17:38:42Z)
TabPFN Unleashed: A Scalable and Effective Solution to Tabular Classification Problems [30.597696775364447]
TabPFN has emerged as a promising in-context learning model. It is capable of directly predicting the labels of test samples given labeled training examples. It has demonstrated competitive performance, particularly on small-scale classification tasks.
arXiv Detail & Related papers (2025-02-04T17:49:44Z)
TabDiff: a Multi-Modal Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all multi-modal distributions of tabular data in one model. Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data. TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z)
A Survey on Deep Tabular Learning [0.0]
Tabular data presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure. This survey reviews the evolution of deep learning models for Tabular data, from early fully connected networks (FCNs) to advanced architectures like TabNet, SAINT, TabTranSELU, and MambaNet.
arXiv Detail & Related papers (2024-10-15T20:08:08Z)
Table Transformers for Imputing Textual Attributes [15.823533688884105]
We propose a novel end-to-end approach called Table Transformers for Imputing Textual Attributes (TTITA) Our approach shows competitive performance outperforming baseline models such as recurrent neural networks and Llama2. We incorporate multi-task learning to simultaneously impute for heterogeneous columns, boosting the performance for text imputation.
arXiv Detail & Related papers (2024-08-04T19:54:12Z)
Cross-Table Pretraining towards a Universal Function Space for Heterogeneous Tabular Data [35.61663559675556]
Cross-dataset pretraining has shown notable success in various fields. In this study, we introduce a cross-table pretrained Transformer, XTFormer, for versatile downstream tabular prediction tasks. Our methodology is pretraining XTFormer to establish a "meta-function" space that encompasses all potential feature-target mappings.
arXiv Detail & Related papers (2024-06-01T03:24:31Z)
Training-Free Generalization on Heterogeneous Tabular Data via Meta-Representation [67.30538142519067]
We propose Tabular data Pre-Training via Meta-representation (TabPTM) A deep neural network is then trained to associate these meta-representations with dataset-specific classification confidences. Experiments validate that TabPTM achieves promising performance in new datasets, even under few-shot scenarios.
arXiv Detail & Related papers (2023-10-31T18:03:54Z)
Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias [92.41919689753051]
Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks. We investigate training data generation with diversely attributed prompts, which have the potential to yield diverse and attributed generated data. We show that attributed prompts outperform simple class-conditional prompts in terms of the resulting model's performance.
arXiv Detail & Related papers (2023-06-28T03:31:31Z)
Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets. We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models. Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z)
GEDI: A Graph-based End-to-end Data Imputation Framework [3.5478302034537705]
The proposed imputation process uses Transformer network and graph structure learning to iteratively refine the contextual relationships among features and similarities among observations. It uses a meta-learning framework to select features that are influential to the downstream prediction task of interest. We conduct experiments on real-world large data sets, and show that the proposed imputation process consistently improves imputation and label prediction performance.
arXiv Detail & Related papers (2022-08-13T05:16:40Z)
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)
SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning [5.5616364225463055]
We introduce a new framework, Subsetting features of Tabular data (SubTab) In this paper, we introduce a new framework, Subsetting features of Tabular data (SubTab) We argue that reconstructing the data from the subset of its features rather than its corrupted version in an autoencoder setting can better capture its underlying representation.
arXiv Detail & Related papers (2021-10-08T20:11:09Z)
X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries. We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP. We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.