Related papers: Comparing Task-Agnostic Embedding Models for Tabular Data

Comparing Task-Agnostic Embedding Models for Tabular Data

URL: http://arxiv.org/abs/2511.14276v1
Date: Tue, 18 Nov 2025 09:10:40 GMT
Title: Comparing Task-Agnostic Embedding Models for Tabular Data
Authors: Frederik Hoppe, Lars Kleinemeier, Astrid Franz, Udo Göbel,
Abstract summary: This work specifically focuses on representation learning, i.e., on transferable, task-agnostic embeddings.<n>Tableizer features achieve comparable or superior performance while being up to three orders of magnitude faster than recent foundation models.
Score: 1.6479389738270018
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent foundation models for tabular data achieve strong task-specific performance via in-context learning. Nevertheless, they focus on direct prediction by encapsulating both representation learning and task-specific inference inside a single, resource-intensive network. This work specifically focuses on representation learning, i.e., on transferable, task-agnostic embeddings. We systematically evaluate task-agnostic representations from tabular foundation models (TabPFN and TabICL) alongside with classical feature engineering (TableVectorizer) across a variety of application tasks as outlier detection (ADBench) and supervised learning (TabArena Lite). We find that simple TableVectorizer features achieve comparable or superior performance while being up to three orders of magnitude faster than tabular foundation models. The code is available at https://github.com/ContactSoftwareAI/TabEmbedBench.

Related papers

RDBLearn: Simple In-Context Prediction Over Relational Databases [21.996337463952255]
We show that a simple recipe can be extended to relational prediction with a simple recipe.<n>We package this approach in textitRDBLearn, an easy-to-use toolkit with a scikit-learn-style estimator interface.<n>Across a broad collection of RelBench and 4DBInfer datasets, RDBLearn is the best-performing foundation model approach we evaluate.
arXiv Detail & Related papers (2026-02-14T09:24:04Z)
nanoTabPFN: A Lightweight and Educational Reimplementation of TabPFN [78.62756717376563]
nanoTabPFN is a lightweight implementation of the TabPFN v2 architecture and a corresponding training loop.<n>It achieves a performance comparable to traditional machine learning baselines within one minute of pre-training on a single GPU.
arXiv Detail & Related papers (2025-11-05T16:52:51Z)
Multimodal Tabular Reasoning with Privileged Structured Information [67.40011423365712]
We introduce TabUlar Reasoning with Bridged infOrmation (sc Turbo)<n>sc Turbo benefits from a structure-aware reasoning trace generator based on DeepSeek-R1.<n>sc Turbo achieves state-of-the-art performance ($+7.2%$ vs. previous SOTA) across multiple datasets.
arXiv Detail & Related papers (2025-06-04T15:46:30Z)
TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields [12.860878027211522]
Tabular Foundation Models can leverage real-world knowledge and generalize across diverse datasets.<n>We introduce TabSTAR: a Tabular Foundation Model with Semantically Target-Aware Representations.
arXiv Detail & Related papers (2025-05-23T17:34:28Z)
Representation Learning for Tabular Data: A Comprehensive Survey [23.606506938919605]
Tabular data, structured as rows and columns, is among the most prevalent data types in machine learning classification and regression applications.<n>Deep Neural Networks (DNNs) have recently demonstrated promising results through their capability of representation learning.<n>We organize existing methods into three main categories according to their generalization capabilities.
arXiv Detail & Related papers (2025-04-17T17:58:23Z)
A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities [51.08999772842298]
Tabular Prior-data Fitted Network v2 (TabPFN v2) achieves unprecedented in-context learning performance across diverse downstream datasets.<n>We show that TabPFN v2 can infer attribute relationships even when provided with randomized attribute token inputs.<n>We demonstrate that TabPFN v2's limitations can be addressed through a test-time divide-and-context strategy.
arXiv Detail & Related papers (2025-02-24T17:38:42Z)
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy [81.76462101465354]
We present a novel large vision-hugging model, TabPedia, equipped with a concept synergy mechanism. This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering. To better evaluate the VTU task in real-world scenarios, we establish a new and comprehensive table VQA benchmark, ComTQA.
arXiv Detail & Related papers (2024-06-03T13:54:05Z)
Unlocking the Transferability of Tokens in Deep Models for Tabular Data [67.11727608815636]
Fine-tuning a pre-trained deep neural network has become a successful paradigm in various machine learning tasks. In this paper, we propose TabToken, a method aims at enhancing the quality of feature tokens. We introduce a contrastive objective that regularizes the tokens, capturing the semantics within and across features.
arXiv Detail & Related papers (2023-10-23T17:53:09Z)
STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables [64.0903766169603]
We propose a framework for few-shot semi-supervised learning, coined Self-generated Tasks from UNlabeled Tables (STUNT) Our key idea is to self-generate diverse few-shot tasks by treating randomly chosen columns as a target label. We then employ a meta-learning scheme to learn generalizable knowledge with the constructed tasks.
arXiv Detail & Related papers (2023-03-02T02:37:54Z)
SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning [5.5616364225463055]
We introduce a new framework, Subsetting features of Tabular data (SubTab) In this paper, we introduce a new framework, Subsetting features of Tabular data (SubTab) We argue that reconstructing the data from the subset of its features rather than its corrupted version in an autoencoder setting can better capture its underlying representation.
arXiv Detail & Related papers (2021-10-08T20:11:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.