Related papers: On Finetuning Tabular Foundation Models

On Finetuning Tabular Foundation Models

URL: http://arxiv.org/abs/2506.08982v2
Date: Wed, 11 Jun 2025 09:47:49 GMT
Title: On Finetuning Tabular Foundation Models
Authors: Ivan Rubachev, Akim Kotelnikov, Nikolay Kartashev, Artem Babenko,
Abstract summary: TabPFNv2 claims superior performance over traditional GBDT-based methods on small-scale datasets.<n>We evaluate various finetuning strategies for TabPFNv2 on diverse datasets.<n>We reveal that the success of finetuning stems from the fact that after gradient-based adaptation, the dot products of the query-representations of test objects more accurately reflect their target similarity.
Score: 29.76586200178702
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Foundation models are an emerging research direction in tabular deep learning. Notably, TabPFNv2 recently claimed superior performance over traditional GBDT-based methods on small-scale datasets using an in-context learning paradigm, which does not adapt model parameters to target datasets. However, the optimal finetuning approach for adapting tabular foundational models, and how this adaptation reshapes their internal mechanisms, remains underexplored. While prior works studied finetuning for earlier foundational models, inconsistent findings and TabPFNv2's unique architecture necessitate fresh investigation. To address these questions, we first systematically evaluate various finetuning strategies on diverse datasets. Our findings establish full finetuning as the most practical solution for TabPFNv2 in terms of time-efficiency and effectiveness. We then investigate how finetuning alters TabPFNv2's inner mechanisms, drawing an analogy to retrieval-augmented models. We reveal that the success of finetuning stems from the fact that after gradient-based adaptation, the dot products of the query-representations of test objects and the key-representations of in-context training objects more accurately reflect their target similarity. This improved similarity allows finetuned TabPFNv2 to better approximate target dependency by appropriately weighting relevant in-context samples, improving the retrieval-based prediction logic. From the practical perspective, we managed to finetune TabPFNv2 on datasets with up to 50K objects, observing performance improvements on almost all tasks. More precisely, on academic datasets with I.I.D. splits, finetuning allows TabPFNv2 to achieve state-of-the-art results, while on datasets with gradual temporal shifts and rich feature sets, TabPFNv2 is less stable and prior methods remain better.

Related papers

End-to-End Compression for Tabular Foundation Models [20.50130399990578]
We propose TACO, an end-to-end compression model that compresses the training dataset in a latent space.<n>We test our method on the TabArena benchmark, where our proposed method is up to 94x faster in inference time, while consuming up to 97% less memory.
arXiv Detail & Related papers (2026-02-05T13:33:58Z)
Estimating Time Series Foundation Model Transferability via In-Context Learning [74.65355820906355]
Time series foundation models (TSFMs) offer strong zero-shot forecasting via large-scale pre-training.<n>Fine-tuning remains critical for boosting performance in domains with limited public data.<n>We introduce TimeTic, a transferability estimation framework that recasts model selection as an in-context-learning problem.
arXiv Detail & Related papers (2025-09-28T07:07:13Z)
Beyond QA Pairs: Assessing Parameter-Efficient Fine-Tuning for Fact Embedding in LLMs [0.0]
This paper focuses on improving the fine-tuning process by categorizing question-answer pairs into Factual and Conceptual classes.<n>Two distinct Llama-2 models are fine-tuned based on these classifications and evaluated using larger models like GPT-3.5 Turbo and Gemini.<n>Our results indicate that models trained on conceptual datasets outperform those trained on factual datasets.
arXiv Detail & Related papers (2025-03-03T03:26:30Z)
A Closer Look at TabPFN v2: Strength, Limitation, and Extension [51.08999772842298]
Tabular Prior-data Fitted Network v2 (TabPFN v2) achieves unprecedented in-context learning accuracy across multiple datasets.<n>In this paper, we evaluate TabPFN v2 on over 300 datasets, confirming its exceptional generalization capabilities on small- to medium-scale tasks.
arXiv Detail & Related papers (2025-02-24T17:38:42Z)
Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later [76.66498833720411]
We introduce a differentiable version of $K$-nearest neighbors (KNN) originally designed to learn a linear projection to capture semantic similarities between instances.<n>Surprisingly, our implementation of NCA using SGD and without dimensionality reduction already achieves decent performance on tabular data.<n>We conclude our paper by analyzing the factors behind these improvements, including loss functions, prediction strategies, and deep architectures.
arXiv Detail & Related papers (2024-07-03T16:38:57Z)
Implicitly Guided Design with PropEn: Match your Data to Follow the Gradient [52.2669490431145]
PropEn is inspired by'matching', which enables implicit guidance without training a discriminator. We show that training with a matched dataset approximates the gradient of the property of interest while remaining within the data distribution.
arXiv Detail & Related papers (2024-05-28T11:30:19Z)
Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting [0.0]
We try to push the understanding of different fine-tuning strategies for large language models (LLMs) We compare state-of-the-art methods like vanilla fine-tuning and Pattern-Based Fine-Tuning (PBFT) on pre-trained models across two datasets, COLA and MNLI. Our findings suggest that these alternative strategies can exhibit out-of-domain generalization comparable to that of vanilla FT and PBFT.
arXiv Detail & Related papers (2024-05-21T20:08:52Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
Leveraging Data Recasting to Enhance Tabular Reasoning [21.970920861791015]
Prior work has mostly relied on two data generation strategies. The first is human annotation, which yields linguistically diverse data but is difficult to scale. The second category for creation is synthetic generation, which is scalable and cost effective but lacks inventiveness.
arXiv Detail & Related papers (2022-11-23T00:04:57Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive. We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)
Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models. We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.