Related papers: TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering

TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering

URL: http://arxiv.org/abs/2410.13203v2
Date: Mon, 21 Oct 2024 15:21:56 GMT
Title: TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering
Authors: Al Zadid Sultan Bin Habib, Kesheng Wang, Mary-Anne Hartley, Gianfranco Doretto, Donald A. Adjeroh,
Abstract summary: This work introduces TabSeq, a novel framework for the sequential ordering of features. Finding the optimum sequence order for such features could improve the deep learning models' learning process.
Score: 5.946579489162407
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Effective analysis of tabular data still poses a significant problem in deep learning, mainly because features in tabular datasets are often heterogeneous and have different levels of relevance. This work introduces TabSeq, a novel framework for the sequential ordering of features, addressing the vital necessity to optimize the learning process. Features are not always equally informative, and for certain deep learning models, their random arrangement can hinder the model's learning capacity. Finding the optimum sequence order for such features could improve the deep learning models' learning process. The novel feature ordering technique we provide in this work is based on clustering and incorporates both local ordering and global ordering. It is designed to be used with a multi-head attention mechanism in a denoising autoencoder network. Our framework uses clustering to align comparable features and improve data organization. Multi-head attention focuses on essential characteristics, whereas the denoising autoencoder highlights important aspects by rebuilding from distorted inputs. This method improves the capability to learn from tabular data while lowering redundancy. Our research, demonstrating improved performance through appropriate feature sequence rearrangement using raw antibody microarray and two other real-world biomedical datasets, validates the impact of feature ordering. These results demonstrate that feature ordering can be a viable approach to improved deep learning of tabular data.

Related papers

Iterative Feature Exclusion Ranking for Deep Tabular Learning [0.0]
Tabular data is a common format for storing information in rows and columns to represent data entries and their features. Deep neural networks have become the main approach for modeling a wide range of domains including computer vision and NLP. Recent models featuring an internal feature selection mechanism with end-to-end gradient-based optimization fail to account for the contextual dependence of feature importance. The proposed module iteratively excludes each feature from the input data and computes the attention scores, which represent the impact of the features on the prediction.
arXiv Detail & Related papers (2024-12-21T02:12:14Z)
TabDeco: A Comprehensive Contrastive Framework for Decoupled Representations in Tabular Data [5.98480077860174]
We introduce TabDeco, a novel method that leverages attention-based encoding strategies across both rows and columns. With the innovative feature decoupling hierarchies, TabDeco consistently surpasses existing deep learning methods.
arXiv Detail & Related papers (2024-11-17T18:42:46Z)
Enriching Tabular Data with Contextual LLM Embeddings: A Comprehensive Ablation Study for Ensemble Classifiers [20.46918103875102]
This study presents a systematic approach to enrich datasets with features derived from large language model embeddings. We assess the impact of RoBERTa and GPT-2 embeddings on ensemble classifiers, including Random Forest, XGBoost, and CatBoost. Results indicate that integrating embeddings with traditional numerical and categorical features often enhances predictive performance.
arXiv Detail & Related papers (2024-11-03T17:45:00Z)
A Survey on Deep Tabular Learning [0.0]
Tabular data presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure. This survey reviews the evolution of deep learning models for Tabular data, from early fully connected networks (FCNs) to advanced architectures like TabNet, SAINT, TabTranSELU, and MambaNet.
arXiv Detail & Related papers (2024-10-15T20:08:08Z)
Distributionally robust self-supervised learning for tabular data [2.942619386779508]
Learning robust representation in presence of error slices is challenging, due to high cardinality features and the complexity of constructing error sets. Traditional robust representation learning methods are largely focused on improving worst group performance in supervised setting in computer vision. Our approach utilizes an encoder-decoder model trained with Masked Language Modeling (MLM) loss to learn robust latent representations.
arXiv Detail & Related papers (2024-10-11T04:23:56Z)
Mambular: A Sequential Model for Tabular Deep Learning [0.7184556517162347]
This paper investigates the use of autoregressive state-space models for tabular data. We compare their performance against established benchmark models. Our findings indicate that interpreting features as a sequence and processing them can lead to significant performance improvement.
arXiv Detail & Related papers (2024-08-12T16:57:57Z)
Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains [0.565395466029518]
We propose a novel pretext task based on the classical binning method. The idea is straightforward: reconstructing the bin indices (either orders or classes) rather than the original values. Our empirical investigations ascertain several advantages of binning.
arXiv Detail & Related papers (2024-05-13T01:23:14Z)
A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers. We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z)
Time Series Contrastive Learning with Information-Aware Augmentations [57.45139904366001]
A key component of contrastive learning is to select appropriate augmentations imposing some priors to construct feasible positive samples. How to find the desired augmentations of time series data that are meaningful for given contrastive learning tasks and datasets remains an open question. We propose a new contrastive learning approach with information-aware augmentations, InfoTS, that adaptively selects optimal augmentations for time series representation learning.
arXiv Detail & Related papers (2023-03-21T15:02:50Z)
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)
Online Coreset Selection for Rehearsal-based Continual Learning [65.85595842458882]
In continual learning, we store a subset of training examples (coreset) to be replayed later to alleviate catastrophic forgetting. We propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration. Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting.
arXiv Detail & Related papers (2021-06-02T11:39:25Z)
Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations. Our framework well preserves the relations between samples. By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.