Related papers: State-Space Models for Tabular Prior-Data Fitted Networks

State-Space Models for Tabular Prior-Data Fitted Networks

URL: http://arxiv.org/abs/2510.14573v1
Date: Thu, 16 Oct 2025 11:31:51 GMT
Title: State-Space Models for Tabular Prior-Data Fitted Networks
Authors: Felix Koch, Marcel Wever, Fabian Raisch, Benjamin Tischler,
Abstract summary: We investigate the potential of using Hydra, a bidirectional linear-time structured state space model, as an alternative to Transformers in TabPFN.<n>Our experiments show that this approach reduces the order-dependence, achieving predictive performance competitive to the original TabPFN model.
Score: 1.9815629827604246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in foundation models for tabular data, such as TabPFN, demonstrated that pretrained Transformer architectures can approximate Bayesian inference with high predictive performance. However, Transformers suffer from quadratic complexity with respect to sequence length, motivating the exploration of more efficient sequence models. In this work, we investigate the potential of using Hydra, a bidirectional linear-time structured state space model (SSM), as an alternative to Transformers in TabPFN. A key challenge lies in SSM's inherent sensitivity to the order of input tokens - an undesirable property for tabular datasets where the row order is semantically meaningless. We investigate to what extent a bidirectional approach can preserve efficiency and enable symmetric context aggregation. Our experiments show that this approach reduces the order-dependence, achieving predictive performance competitive to the original TabPFN model.

Related papers

PRISM: Parallel Residual Iterative Sequence Model [52.26239951489612]
We propose PRISM (Parallel Residual Iterative Sequence Model) to resolve this tension.<n>PRISM introduces a solver-inspired inductive bias that captures key structural properties of multi-step refinement in a parallelizable form.<n>We prove that this formulation achieves Rank-$L$ accumulation, structurally expanding the update manifold beyond the single-step Rank-$1$ bottleneck.
arXiv Detail & Related papers (2026-02-11T12:39:41Z)
Tabular foundation model for GEOAI benchmark problems BM/AirportSoilProperties/2/2025 [2.07098502859192]
This paper presents a novel application of the Tabular Prior-Data Fitted Network (TabPFN) to site characterization problems defined in the GEOAI benchmark BM/AirportSoilProperties/2/2025.<n>We apply TabPFN in a zero-training, few-shot, in-spatial learning setting and provide it with additional context from the big indirect database (BID)<n>The study demonstrates that TabPFN, as a general-purpose foundation model, achieved superior accuracy and well-calibrated predictive distributions.
arXiv Detail & Related papers (2025-09-03T10:21:18Z)
Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling [0.0]
Gated Associative Memory (GAM) network is a novel, fully parallel architecture for sequence modeling.<n>We implement GAM from scratch and conduct a rigorous comparative analysis against a standard Transformer model and a modern linear-time baseline.<n>Our experiments demonstrate that GAM is consistently faster, outperforming both baselines on training speed, and achieves a superior or competitive final validation perplexity across all datasets.
arXiv Detail & Related papers (2025-08-30T20:59:46Z)
A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities [51.08999772842298]
Tabular Prior-data Fitted Network v2 (TabPFN v2) achieves unprecedented in-context learning performance across diverse downstream datasets.<n>We show that TabPFN v2 can infer attribute relationships even when provided with randomized attribute token inputs.<n>We demonstrate that TabPFN v2's limitations can be addressed through a test-time divide-and-context strategy.
arXiv Detail & Related papers (2025-02-24T17:38:42Z)
TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all mixed-type distributions of tabular data in one model.<n>Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data.<n>TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z)
A Survey on Deep Tabular Learning [0.0]
Tabular data presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure. This survey reviews the evolution of deep learning models for Tabular data, from early fully connected networks (FCNs) to advanced architectures like TabNet, SAINT, TabTranSELU, and MambaNet.
arXiv Detail & Related papers (2024-10-15T20:08:08Z)
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost [53.746169882193456]
Recent works have proposed various sparse attention modules to overcome the quadratic cost of self-attention. We propose a model that resolves both problems by endowing each attention head with a mixed-membership Block Model. Our model outperforms previous efficient variants as well as the original Transformer with full attention.
arXiv Detail & Related papers (2022-10-27T15:30:52Z)
Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network. PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
Supervised Learning for Non-Sequential Data: A Canonical Polyadic Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks. To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor. For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.