End-to-End Compression for Tabular Foundation Models
- URL: http://arxiv.org/abs/2602.05649v1
- Date: Thu, 05 Feb 2026 13:33:58 GMT
- Title: End-to-End Compression for Tabular Foundation Models
- Authors: Guri Zabërgja, Rafiq Kamel, Arlind Kadra, Christian M. M. Frey, Josif Grabocka,
- Abstract summary: We propose TACO, an end-to-end compression model that compresses the training dataset in a latent space.<n>We test our method on the TabArena benchmark, where our proposed method is up to 94x faster in inference time, while consuming up to 97% less memory.
- Score: 20.50130399990578
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The long-standing dominance of gradient-boosted decision trees for tabular data has recently been challenged by in-context learning tabular foundation models. In-context learning methods fit and predict in one forward pass without parameter updates by leveraging the training data as context for predicting on query test points. While recent tabular foundation models achieve state-of-the-art performance, their transformer architecture based on the attention mechanism has quadratic complexity regarding dataset size, which in turn increases the overhead on training and inference time, and limits the capacity of the models to handle large-scale datasets. In this work, we propose TACO, an end-to-end tabular compression model that compresses the training dataset in a latent space. We test our method on the TabArena benchmark, where our proposed method is up to 94x faster in inference time, while consuming up to 97\% less memory compared to the state-of-the-art tabular transformer architecture, all while retaining performance without significant degradation. Lastly, our method not only scales better with increased dataset sizes, but it also achieves better performance compared to other baselines.
Related papers
- TabICLv2: A better, faster, scalable, and open tabular foundation model [18.594859017648346]
We introduce TabICLv2, a new state-of-the-art foundation model for regression and classification built on three pillars.<n>Tabiclv2 generalizes effectively to million-scale datasets under 50GB GPU memory while being markedly faster than RealTabPFN-2.5.
arXiv Detail & Related papers (2026-02-11T18:51:02Z) - iLTM: Integrated Large Tabular Model [41.81329403540607]
iLTM is an integrated Large Tabular Model that unifies tree-derived embeddings, dimensionality-agnostic representations, a meta-trained hypernetwork, multilayer perceptrons, and retrieval within a single architecture.
arXiv Detail & Related papers (2025-11-20T00:20:16Z) - Estimating Time Series Foundation Model Transferability via In-Context Learning [74.65355820906355]
Time series foundation models (TSFMs) offer strong zero-shot forecasting via large-scale pre-training.<n>Fine-tuning remains critical for boosting performance in domains with limited public data.<n>We introduce TimeTic, a transferability estimation framework that recasts model selection as an in-context-learning problem.
arXiv Detail & Related papers (2025-09-28T07:07:13Z) - TabArena: A Living Benchmark for Machine Learning on Tabular Data [45.52876263971067]
We introduce TabArena, the first continuously maintained living benchmarking system.<n>We manually curate a representative collection of datasets and well-implemented models.<n>We show that deep learning methods have caught up under larger time budgets with ensembling.<n>We observe that some deep learning models are overrepresented in cross-model ensembles due to validation set overfitting.
arXiv Detail & Related papers (2025-06-20T07:14:48Z) - On Finetuning Tabular Foundation Models [29.76586200178702]
TabPFNv2 claims superior performance over traditional GBDT-based methods on small-scale datasets.<n>We evaluate various finetuning strategies for TabPFNv2 on diverse datasets.<n>We reveal that the success of finetuning stems from the fact that after gradient-based adaptation, the dot products of the query-representations of test objects more accurately reflect their target similarity.
arXiv Detail & Related papers (2025-06-10T16:52:31Z) - Byte Latent Transformer: Patches Scale Better Than Tokens [101.10994909832063]
Byte Latent Transformer (BLT) encodes bytes into dynamically sized patches, which serve as the primary units of computation.<n>For fixed inference costs, BLT shows significantly better scaling than tokenization-based models, by simultaneously growing both patch and model size.
arXiv Detail & Related papers (2024-12-13T05:33:32Z) - TabDPT: Scaling Tabular Foundation Models on Real Data [20.00390825519329]
We propose an approach to combine ICL-based retrieval with self supervised learning to train foundation models.<n>We show that incorporating real data during the pre-training phase can lead to significantly faster training and better generalization to unseen data.<n>Our resulting model, TabDPT, achieves top performance on both regression (CTR23) and classification (CC18) benchmarks.
arXiv Detail & Related papers (2024-10-23T18:00:00Z) - Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later [76.66498833720411]
We introduce a differentiable version of $K$-nearest neighbors (KNN) originally designed to learn a linear projection to capture semantic similarities between instances.<n>Surprisingly, our implementation of NCA using SGD and without dimensionality reduction already achieves decent performance on tabular data.<n>We conclude our paper by analyzing the factors behind these improvements, including loss functions, prediction strategies, and deep architectures.
arXiv Detail & Related papers (2024-07-03T16:38:57Z) - A Closer Look at Deep Learning Methods on Tabular Datasets [78.61845513154502]
We present an extensive study on TALENT, a collection of 300+ datasets spanning broad ranges of size.<n>Our evaluation shows that ensembling benefits both tree-based and neural approaches.
arXiv Detail & Related papers (2024-07-01T04:24:07Z) - Retrieval & Fine-Tuning for In-Context Tabular Models [16.668695961462827]
Recent advancements using transformer-based in-context learning have shown promise on smaller and less complex datasets, but have struggled to scale to larger and more complex ones.
We propose a combination of retrieval and fine-tuning: we can adapt the transformer to a local subset of the data by collecting nearest neighbours, and then perform task-specific fine-tuning with this retrieved set of neighbours in context.
We show a significant boost in performance compared to the base in-context model.
arXiv Detail & Related papers (2024-06-07T18:43:33Z) - TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks [90.00817095558094]
Prior-data fitted networks (PFNs) make use of pretraining and in-context learning to achieve strong performance on new tasks in a single forward pass.
We introduce TuneTables, a parameter-efficient fine-tuning strategy for PFNs that compresses large datasets into a smaller learned context.
We show that TuneTables can be used as an interpretability tool and can even be used to mitigate biases by optimizing a fairness objective.
arXiv Detail & Related papers (2024-02-17T00:02:23Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.