Between-Sample Relationship in Learning Tabular Data Using Graph and
Attention Networks
- URL: http://arxiv.org/abs/2306.06772v1
- Date: Sun, 11 Jun 2023 20:56:21 GMT
- Title: Between-Sample Relationship in Learning Tabular Data Using Graph and
Attention Networks
- Authors: Shourav B. Rabbani and Manar D. Samad
- Abstract summary: This paper relaxes the i.i.d assumption to learn tabular data representations by incorporating between-sample relationships.
We investigate our hypothesis using several GNNs and state-of-the-art (SOTA) deep attention models.
Our results reveal that attention-based GNN methods outperform traditional machine learning on five data sets and SOTA deep tabular learning methods on three data sets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Traditional machine learning assumes samples in tabular data to be
independent and identically distributed (i.i.d). This assumption may miss
useful information within and between sample relationships in representation
learning. This paper relaxes the i.i.d assumption to learn tabular data
representations by incorporating between-sample relationships for the first
time using graph neural networks (GNN). We investigate our hypothesis using
several GNNs and state-of-the-art (SOTA) deep attention models to learn the
between-sample relationship on ten tabular data sets by comparing them to
traditional machine learning methods. GNN methods show the best performance on
tabular data with large feature-to-sample ratios. Our results reveal that
attention-based GNN methods outperform traditional machine learning on five
data sets and SOTA deep tabular learning methods on three data sets.
Between-sample learning via GNN and deep attention methods yield the best
classification accuracy on seven of the ten data sets. This suggests that the
i.i.d assumption may not always hold for most tabular data sets.
Related papers
- TabGraphs: A Benchmark and Strong Baselines for Learning on Graphs with Tabular Node Features [17.277932238538302]
Tabular machine learning may benefit from graph machine learning methods.
graph neural networks (GNNs) can indeed often bring gains in predictive performance.
Simple feature preprocessing enables them to compete with and even outperform GNNs.
arXiv Detail & Related papers (2024-09-22T15:53:19Z) - Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing.
We present TP-BERTa, a specifically pre-trained LM for tabular data prediction.
A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z) - Attention versus Contrastive Learning of Tabular Data -- A Data-centric
Benchmarking [0.0]
This article extensively evaluates state-of-the-art attention and contrastive learning methods on a wide selection of 28 data sets.
We find that a hybrid attention-contrastive learning strategy mostly wins on hard-to-classify data sets.
Traditional methods are frequently superior on easy-to-classify data sets with presumably simpler decision boundaries.
arXiv Detail & Related papers (2024-01-08T22:36:05Z) - Training-Free Generalization on Heterogeneous Tabular Data via
Meta-Representation [67.30538142519067]
We propose Tabular data Pre-Training via Meta-representation (TabPTM)
A deep neural network is then trained to associate these meta-representations with dataset-specific classification confidences.
Experiments validate that TabPTM achieves promising performance in new datasets, even under few-shot scenarios.
arXiv Detail & Related papers (2023-10-31T18:03:54Z) - Learning Representations without Compositional Assumptions [79.12273403390311]
We propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges.
We also introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically.
arXiv Detail & Related papers (2023-05-31T10:36:10Z) - TabGSL: Graph Structure Learning for Tabular Data Prediction [10.66048003460524]
We present a novel solution, Tabular Graph Structure Learning (TabGSL), to enhance tabular data prediction.
Experiments conducted on 30 benchmark datasets demonstrate that TabGSL markedly outperforms both tree-based models and recent deep learning-based models.
arXiv Detail & Related papers (2023-05-25T08:33:48Z) - Why do tree-based models still outperform deep learning on tabular data? [0.0]
We show that tree-based models remain state-of-the-art on medium-sized data.
We conduct an empirical investigation into the differing inductive biases of tree-based models and Neural Networks (NNs)
arXiv Detail & Related papers (2022-07-18T08:36:08Z) - Transfer Learning with Deep Tabular Models [66.67017691983182]
We show that upstream data gives tabular neural networks a decisive advantage over GBDT models.
We propose a realistic medical diagnosis benchmark for tabular transfer learning.
We propose a pseudo-feature method for cases where the upstream and downstream feature sets differ.
arXiv Detail & Related papers (2022-06-30T14:24:32Z) - A Robust Stacking Framework for Training Deep Graph Models with
Multifaceted Node Features [61.92791503017341]
Graph Neural Networks (GNNs) with numerical node features and graph structure as inputs have demonstrated superior performance on various supervised learning tasks with graph data.
The best models for such data types in most standard supervised learning settings with IID (non-graph) data are not easily incorporated into a GNN.
Here we propose a robust stacking framework that fuses graph-aware propagation with arbitrary models intended for IID data.
arXiv Detail & Related papers (2022-06-16T22:46:33Z) - TabGNN: Multiplex Graph Neural Network for Tabular Data Prediction [43.35301059378836]
We propose a novel framework TabGNN based on recently popular graph neural networks (GNN)
Specifically, we firstly construct a multiplex graph to model the multifaceted sample relations, and then design a multiplex graph neural network to learn enhanced representation for each sample.
Experiments on eleven TDP datasets from various domains, including classification and regression ones, show that TabGNN can consistently improve the performance.
arXiv Detail & Related papers (2021-08-20T11:51:32Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.