Tabular foundation model for GEOAI benchmark problems BM/AirportSoilProperties/2/2025
- URL: http://arxiv.org/abs/2509.03191v1
- Date: Wed, 03 Sep 2025 10:21:18 GMT
- Title: Tabular foundation model for GEOAI benchmark problems BM/AirportSoilProperties/2/2025
- Authors: Taiga Saito, Yu Otake, Stephen Wu,
- Abstract summary: This paper presents a novel application of the Tabular Prior-Data Fitted Network (TabPFN) to site characterization problems defined in the GEOAI benchmark BM/AirportSoilProperties/2/2025.<n>We apply TabPFN in a zero-training, few-shot, in-spatial learning setting and provide it with additional context from the big indirect database (BID)<n>The study demonstrates that TabPFN, as a general-purpose foundation model, achieved superior accuracy and well-calibrated predictive distributions.
- Score: 2.07098502859192
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel application of the Tabular Prior-Data Fitted Network (TabPFN) - a transformer-based foundation model for tabular data - to geotechnical site characterization problems defined in the GEOAI benchmark BM/AirportSoilProperties/2/2025. Two tasks are addressed: (1) predicting the spatial variation of undrained shear strength (su) across borehole depth profiles, and (2) imputing missing mechanical parameters in a dense-site dataset. We apply TabPFN in a zero-training, few-shot, in-context learning setting - without hyper-parameter tuning - and provide it with additional context from the big indirect database (BID). The study demonstrates that TabPFN, as a general-purpose foundation model, achieved superior accuracy and well-calibrated predictive distributions compared to a conventional hierarchical Bayesian model (HBM) baseline, while also offering significant gains in inference efficiency. In Benchmark Problem #1 (spatial su prediction), TabPFN outperformed the HBM in prediction accuracy and delivered an order-of-magnitude faster runtime. In Benchmark Problem #2 (missing mechanical parameter imputation), TabPFN likewise achieved lower RMSE for all target parameters with well-quantified uncertainties, though its cumulative computation cost was higher than HBM's due to its one-variable-at-a-time inference. These results mark the first successful use of a tabular foundation model in geotechnical modeling, suggesting a potential paradigm shift in probabilistic site characterization.
Related papers
- Causal Pre-training Under the Fairness Lens: An Empirical Study of TabPFN [3.059960033014892]
We evaluate the Tabular Prior-data Fitted Network (TabPFN) and its fine-tuned variants.<n>Our results reveal that while TabPFN achieves stronger predictive accuracy compared to baselines, improvements in fairness are moderate and inconsistent.<n>These findings suggest that the causal pre-training in TabPFN is helpful but insufficient for algorithmic fairness.
arXiv Detail & Related papers (2026-01-25T17:17:12Z) - Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training [11.179110411255708]
We propose a direct framework to model the scaling of benchmark performance from the training budget.<n>Our results show that the direct approach extrapolates better than the previously proposed two-stage procedure.<n>We release the complete set of pretraining losses and downstream evaluation results.
arXiv Detail & Related papers (2025-12-09T18:33:48Z) - Robust Tabular Foundation Models [0.7539295827164078]
A key finding is that TFMs can be pretrained entirely on synthetic datasets.<n>We introduce an optimality gap measure, given by the difference between TFM performance and the best achievable performance.<n>These results highlight a promising new dataset for targeted adversarial training and fine-tuning of TFMs using synthetic data alone.
arXiv Detail & Related papers (2025-12-02T23:40:39Z) - GEO-Bench-2: From Performance to Capability, Rethinking Evaluation in Geospatial AI [52.13138825802668]
GeoFMs are transforming Earth Observation, but evaluation lacks standardized protocols.<n> GEO-Bench-2 addresses this with a comprehensive framework spanning classification, segmentation, regression, object detection, and instance segmentation.<n>Code, data, and leaderboard for GEO-Bench-2 are publicly released under a permissive license.
arXiv Detail & Related papers (2025-11-19T17:45:02Z) - State-Space Models for Tabular Prior-Data Fitted Networks [1.9815629827604246]
We investigate the potential of using Hydra, a bidirectional linear-time structured state space model, as an alternative to Transformers in TabPFN.<n>Our experiments show that this approach reduces the order-dependence, achieving predictive performance competitive to the original TabPFN model.
arXiv Detail & Related papers (2025-10-16T11:31:51Z) - Practical Bayes-Optimal Membership Inference Attacks [57.06788930775812]
We develop practical and theoretically grounded membership inference attacks (MIAs) against both independent and identically distributed (i.i.d.) data and graph-structured data.<n>Building on the Bayesian decision-theoretic framework of Sablayrolles et al., we derive the Bayes-optimal membership inference rule for node-level MIAs against graph neural networks.
arXiv Detail & Related papers (2025-05-30T00:23:01Z) - Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models [5.952993835541411]
We show how TabPFN can be used as pre-trained autoregressive conditional density estimators for simulation-based inference.<n>NPE-PF eliminates the need for inference network selection, training, and hyper parameter tuning.<n>It exhibits superior robustness to model misspecification and can be scaled to simulation budgets that exceed the context size limit of TabPFN.
arXiv Detail & Related papers (2025-04-24T15:29:39Z) - A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities [51.08999772842298]
Tabular Prior-data Fitted Network v2 (TabPFN v2) achieves unprecedented in-context learning performance across diverse downstream datasets.<n>We show that TabPFN v2 can infer attribute relationships even when provided with randomized attribute token inputs.<n>We demonstrate that TabPFN v2's limitations can be addressed through a test-time divide-and-context strategy.
arXiv Detail & Related papers (2025-02-24T17:38:42Z) - From Tables to Time: How TabPFN-v2 Outperforms Specialized Time Series Forecasting Models [40.19199376033612]
We introduce TabPFN-TS, a simple method that combines TabPFN-v2 with lightweight feature engineering to enable both point and probabilistic forecasting.<n>Despite its simplicity and compact size (11M parameters), TabPFN-TS achieves top rank on the public GIFT-Eval leaderboard in both forecasting tasks.
arXiv Detail & Related papers (2025-01-06T11:38:19Z) - Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data [39.40116554523575]
We present Drift-Resilient TabPFN, a fresh approach based on In-Context Learning with a Prior-Data Fitted Network.
It learns to approximate Bayesian inference on synthetic datasets drawn from a prior.
It improves accuracy from 0.688 to 0.744 and ROC AUC from 0.786 to 0.832 while maintaining stronger calibration.
arXiv Detail & Related papers (2024-11-15T23:49:23Z) - Enhancing Microgrid Performance Prediction with Attention-based Deep Learning Models [0.0]
This research aims to address microgrid systems' operational challenges, characterized by power oscillations that contribute to grid instability.
An integrated strategy is proposed, leveraging the strengths of convolutional and Gated Recurrent Unit (GRU) layers.
The framework is anchored by a Multi-Layer Perceptron (MLP) model, which is tasked with comprehensive load forecasting.
arXiv Detail & Related papers (2024-07-20T21:24:11Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Uncovering the Hidden Cost of Model Compression [43.62624133952414]
Visual Prompting has emerged as a pivotal method for transfer learning in computer vision.
Model compression detrimentally impacts the performance of visual prompting-based transfer.
However, negative effects on calibration are not present when models are compressed via quantization.
arXiv Detail & Related papers (2023-08-29T01:47:49Z) - Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation.
We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation.
We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.