Related papers: Between-Sample Relationship in Learning Tabular Data Using Graph and Attention Networks

Between-Sample Relationship in Learning Tabular Data Using Graph and Attention Networks

URL: http://arxiv.org/abs/2306.06772v1
Date: Sun, 11 Jun 2023 20:56:21 GMT
Title: Between-Sample Relationship in Learning Tabular Data Using Graph and Attention Networks
Authors: Shourav B. Rabbani and Manar D. Samad
Abstract summary: This paper relaxes the i.i.d assumption to learn tabular data representations by incorporating between-sample relationships. We investigate our hypothesis using several GNNs and state-of-the-art (SOTA) deep attention models. Our results reveal that attention-based GNN methods outperform traditional machine learning on five data sets and SOTA deep tabular learning methods on three data sets.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Traditional machine learning assumes samples in tabular data to be independent and identically distributed (i.i.d). This assumption may miss useful information within and between sample relationships in representation learning. This paper relaxes the i.i.d assumption to learn tabular data representations by incorporating between-sample relationships for the first time using graph neural networks (GNN). We investigate our hypothesis using several GNNs and state-of-the-art (SOTA) deep attention models to learn the between-sample relationship on ten tabular data sets by comparing them to traditional machine learning methods. GNN methods show the best performance on tabular data with large feature-to-sample ratios. Our results reveal that attention-based GNN methods outperform traditional machine learning on five data sets and SOTA deep tabular learning methods on three data sets. Between-sample learning via GNN and deep attention methods yield the best classification accuracy on seven of the ten data sets. This suggests that the i.i.d assumption may not always hold for most tabular data sets.

Related papers

Representation Learning for Tabular Data: A Comprehensive Survey [23.606506938919605]
Tabular data, structured as rows and columns, is among the most prevalent data types in machine learning classification and regression applications. Deep Neural Networks (DNNs) have recently demonstrated promising results through their capability of representation learning. We organize existing methods into three main categories according to their generalization capabilities.
arXiv Detail & Related papers (2025-04-17T17:58:23Z)
TabGraphs: A Benchmark and Strong Baselines for Learning on Graphs with Tabular Node Features [17.277932238538302]
Tabular machine learning may benefit from graph machine learning methods. graph neural networks (GNNs) can indeed often bring gains in predictive performance. Simple feature preprocessing enables them to compete with and even outperform GNNs.
arXiv Detail & Related papers (2024-09-22T15:53:19Z)
A Closer Look at Deep Learning Methods on Tabular Datasets [52.50778536274327]
Tabular data is prevalent across diverse domains in machine learning. Deep Neural Network (DNN)-based methods have recently demonstrated promising performance. We compare 32 state-of-the-art deep and tree-based methods, evaluating their average performance across multiple criteria.
arXiv Detail & Related papers (2024-07-01T04:24:07Z)
Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing. We present TP-BERTa, a specifically pre-trained LM for tabular data prediction. A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z)
Attention versus Contrastive Learning of Tabular Data -- A Data-centric Benchmarking [0.0]
This article extensively evaluates state-of-the-art attention and contrastive learning methods on a wide selection of 28 data sets. We find that a hybrid attention-contrastive learning strategy mostly wins on hard-to-classify data sets. Traditional methods are frequently superior on easy-to-classify data sets with presumably simpler decision boundaries.
arXiv Detail & Related papers (2024-01-08T22:36:05Z)
Training-Free Generalization on Heterogeneous Tabular Data via Meta-Representation [67.30538142519067]
We propose Tabular data Pre-Training via Meta-representation (TabPTM) A deep neural network is then trained to associate these meta-representations with dataset-specific classification confidences. Experiments validate that TabPTM achieves promising performance in new datasets, even under few-shot scenarios.
arXiv Detail & Related papers (2023-10-31T18:03:54Z)
Learning Representations without Compositional Assumptions [79.12273403390311]
We propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges. We also introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically.
arXiv Detail & Related papers (2023-05-31T10:36:10Z)
TabGSL: Graph Structure Learning for Tabular Data Prediction [10.66048003460524]
We present a novel solution, Tabular Graph Structure Learning (TabGSL), to enhance tabular data prediction. Experiments conducted on 30 benchmark datasets demonstrate that TabGSL markedly outperforms both tree-based models and recent deep learning-based models.
arXiv Detail & Related papers (2023-05-25T08:33:48Z)
Why do tree-based models still outperform deep learning on tabular data? [0.0]
We show that tree-based models remain state-of-the-art on medium-sized data. We conduct an empirical investigation into the differing inductive biases of tree-based models and Neural Networks (NNs)
arXiv Detail & Related papers (2022-07-18T08:36:08Z)
Transfer Learning with Deep Tabular Models [66.67017691983182]
We show that upstream data gives tabular neural networks a decisive advantage over GBDT models. We propose a realistic medical diagnosis benchmark for tabular transfer learning. We propose a pseudo-feature method for cases where the upstream and downstream feature sets differ.
arXiv Detail & Related papers (2022-06-30T14:24:32Z)
A Robust Stacking Framework for Training Deep Graph Models with Multifaceted Node Features [61.92791503017341]
Graph Neural Networks (GNNs) with numerical node features and graph structure as inputs have demonstrated superior performance on various supervised learning tasks with graph data. The best models for such data types in most standard supervised learning settings with IID (non-graph) data are not easily incorporated into a GNN. Here we propose a robust stacking framework that fuses graph-aware propagation with arbitrary models intended for IID data.
arXiv Detail & Related papers (2022-06-16T22:46:33Z)
TabGNN: Multiplex Graph Neural Network for Tabular Data Prediction [43.35301059378836]
We propose a novel framework TabGNN based on recently popular graph neural networks (GNN) Specifically, we firstly construct a multiplex graph to model the multifaceted sample relations, and then design a multiplex graph neural network to learn enhanced representation for each sample. Experiments on eleven TDP datasets from various domains, including classification and regression ones, show that TabGNN can consistently improve the performance.
arXiv Detail & Related papers (2021-08-20T11:51:32Z)
Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations. Our framework well preserves the relations between samples. By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.