nanoTabPFN: A Lightweight and Educational Reimplementation of TabPFN
- URL: http://arxiv.org/abs/2511.03634v1
- Date: Wed, 05 Nov 2025 16:52:51 GMT
- Title: nanoTabPFN: A Lightweight and Educational Reimplementation of TabPFN
- Authors: Alexander Pfefferle, Johannes Hog, Lennart Purucker, Frank Hutter,
- Abstract summary: nanoTabPFN is a lightweight implementation of the TabPFN v2 architecture and a corresponding training loop.<n>It achieves a performance comparable to traditional machine learning baselines within one minute of pre-training on a single GPU.
- Score: 78.62756717376563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tabular foundation models such as TabPFN have revolutionized predictive machine learning for tabular data. At the same time, the driving factors of this revolution are hard to understand. Existing open-source tabular foundation models are implemented in complicated pipelines boasting over 10,000 lines of code, lack architecture documentation or code quality. In short, the implementations are hard to understand, not beginner-friendly, and complicated to adapt for new experiments. We introduce nanoTabPFN, a simplified and lightweight implementation of the TabPFN v2 architecture and a corresponding training loop that uses pre-generated training data. nanoTabPFN makes tabular foundation models more accessible to students and researchers alike. For example, restricted to a small data setting it achieves a performance comparable to traditional machine learning baselines within one minute of pre-training on a single GPU (160,000x faster than TabPFN v2 pretraining). This eliminated requirement of large computational resources makes pre-training tabular foundation models accessible for educational purposes. Our code is available at https://github.com/automl/nanoTabPFN.
Related papers
- Relational In-Context Learning via Synthetic Pre-training with Structural Prior [60.404256960057545]
RDB-PFN is the first relational foundation model trained purely via $textbfsynthetic$.<n>Inspired by Prior-Data Fitted Networks (PFNs) where synthetic data generated from Structural Causal Models (SCMs) enables reasoning on single tables.<n>Experiments verify RDB-PFN achieves strong few-shot performance on 19 real-world prediction tasks.
arXiv Detail & Related papers (2026-03-04T07:30:54Z) - TabICLv2: A better, faster, scalable, and open tabular foundation model [18.594859017648346]
We introduce TabICLv2, a new state-of-the-art foundation model for regression and classification built on three pillars.<n>Tabiclv2 generalizes effectively to million-scale datasets under 50GB GPU memory while being markedly faster than RealTabPFN-2.5.
arXiv Detail & Related papers (2026-02-11T18:51:02Z) - Chunked TabPFN: Exact Training-Free In-Context Learning for Long-Context Tabular Data [2.2682391370097794]
We introduce a tiled-block strategy to compute attention within the TabPFN framework.<n>This design is compatible with standard GPU setups.<n>We demonstrate the effectiveness of our approach on the standard TabArena benchmark.
arXiv Detail & Related papers (2025-08-30T02:57:01Z) - TabPFN: One Model to Rule Them All? [21.658323618943697]
We provide a tailored explanation of how TabPFN works for a statistics audience.<n>We show that an out-of-the-box application of TabPFN vastly outperforms specialized state-of-the-art methods.
arXiv Detail & Related papers (2025-05-26T13:55:29Z) - A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities [51.08999772842298]
Tabular Prior-data Fitted Network v2 (TabPFN v2) achieves unprecedented in-context learning performance across diverse downstream datasets.<n>We show that TabPFN v2 can infer attribute relationships even when provided with randomized attribute token inputs.<n>We demonstrate that TabPFN v2's limitations can be addressed through a test-time divide-and-context strategy.
arXiv Detail & Related papers (2025-02-24T17:38:42Z) - Tokenize features, enhancing tables: the FT-TABPFN model for tabular classification [13.481699494376809]
FT-TabPFN is an enhanced version of TabPFN that includes a novel Feature Tokenization layer to better handle classification features.
Our full source code is available for community use and development.
arXiv Detail & Related papers (2024-06-11T02:13:46Z) - TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks [90.00817095558094]
Prior-data fitted networks (PFNs) make use of pretraining and in-context learning to achieve strong performance on new tasks in a single forward pass.
We introduce TuneTables, a parameter-efficient fine-tuning strategy for PFNs that compresses large datasets into a smaller learned context.
We show that TuneTables can be used as an interpretability tool and can even be used to mitigate biases by optimizing a fairness objective.
arXiv Detail & Related papers (2024-02-17T00:02:23Z) - Generative Table Pre-training Empowers Models for Tabular Prediction [71.76829961276032]
We propose TapTap, the first attempt that leverages table pre-training to empower models for tabular prediction.
TapTap can generate high-quality synthetic tables to support various applications, including privacy protection, low resource regime, missing value imputation, and imbalanced classification.
It can be easily combined with various backbone models, including LightGBM, Multilayer Perceptron (MLP) and Transformer.
arXiv Detail & Related papers (2023-05-16T06:37:38Z) - TabPFN: A Transformer That Solves Small Tabular Classification Problems
in a Second [48.87527918630822]
We present TabPFN, a trained Transformer that can do supervised classification for small datasets in less than a second.
TabPFN performs in-context learning (ICL), it learns to make predictions using sequences of labeled examples.
We show that our method clearly outperforms boosted trees and performs on par with complex state-of-the-art AutoML systems with up to 230$times$ speedup.
arXiv Detail & Related papers (2022-07-05T07:17:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.