Fugu-MT 論文翻訳(概要): Tabular Foundation Model for Generative Modelling

論文の概要: Tabular Foundation Model for Generative Modelling

arxiv url: http://arxiv.org/abs/2605.09424v1
Date: Sun, 10 May 2026 08:52:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.243585
Title: Tabular Foundation Model for Generative Modelling
Title（参考訳）: 生成モデルのためのタブラリ基礎モデル
Authors: Xiangjian Jiang, Mingxuan Liu, Nikola Simidjievski, Tassilo Klein, Mateja Jamnik,
Abstract要約: 生成モデリングは、教師付き予測対象のみを最適化するのではなく、与えられたデータモダリティに対して堅牢で全体論的表現学習を必要とする。既存の表形式のファンデーションジェネレータは、合成データ品質において、強力なデータセット固有のジェネレータと一貫して一致していない。事前訓練されたtextbfTabular textbffoundational textbfRepresentation for textbfGEneration 上に構築した TabFORGE を紹介する。
参考スコア（独自算出の注目度）: 34.3599321018728
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative modelling is a demanding test of foundation models, because it requires robust, holistic representation learning for a given data modality, rather than optimisation for a supervised prediction target alone. While recent work on tabular foundation models has achieved remarkable progress in predictive modelling, generative tabular foundation models remain underexplored. Existing tabular foundation generators, in particular, have not yet consistently matched strong dataset-specific generators in synthetic data quality. A key reason is their misalignment with the distinctive causal structural prior of heterogeneous tabular data. In this paper, we address this gap by introducing a novel tabular foundation model, \textbf{TabFORGE}, built on pretrained \textbf{Tab}ular \textbf{FO}undational \textbf{R}epresentations for \textbf{GE}neration. TabFORGE is designed to utilise the implicitly learned causal information underlying diverse tabular datasets in a unified latent space induced by a pretrained causality-aware feature encoder. It further decouples latent modelling from decoding through a two-stage design: we first pretrain a score-based diffusion transformer, and then pretrain a denoising-aligned decoder using the denoised latent embeddings. This design elegantly mitigates the distribution shifts in latent embeddings that typically arise between training and inference. We evaluate TabFORGE comprehensively against 22 benchmark methods on 45 real-world datasets. Our results show that TabFORGE effectively learns and leverages generalisable tabular representations, enabling efficient generation of high-quality synthetic tabular data, particularly with strong structural fidelity.
Abstract（参考訳）: 生成モデリングは、教師付き予測対象のみを最適化するのではなく、与えられたデータモダリティに対して堅牢で全体論的表現学習を必要とするため、基礎モデルの要求のあるテストである。表層基礎モデルに関する最近の研究は予測的モデリングにおいて顕著な進歩を遂げているが、生成的表層基礎モデルはまだ未定である。既存の表形式の基盤ジェネレータは、合成データの品質において、強力なデータセット固有のジェネレータと一貫して一致していない。主な理由は、不均一な表型データに先立って、特徴的な因果構造との相違である。本稿では,事前学習した \textbf{Tab}ular \textbf{FO}undational \textbf{R}epresentation for \textbf{GE}neration 上に構築された新しい表層基盤モデルである \textbf{TabFORGE} を導入することで,このギャップに対処する。 TabFORGEは、事前訓練された因果認識機能エンコーダによって誘導される統一潜在空間において、暗黙的に学習された多様な表層データセットに基づく因果情報を活用するように設計されている。さらに2段階の設計により、遅延モデリングを復号化から切り離し、まずスコアベースの拡散変換器を事前訓練し、次に復号化遅延埋め込みを用いて復号化整列デコーダを事前訓練する。この設計は、通常トレーニングと推論の間に生じる潜伏埋め込みにおける分布シフトをエレガントに緩和する。我々は,TabFORGEを実世界の45のデータセット上で22のベンチマーク手法に対して総合的に評価した。以上の結果から,TabFORGEは汎用的な表表表現を効果的に学習し,活用し,高品質な合成表表データの効率的な生成を可能にする。

論文の概要: Tabular Foundation Model for Generative Modelling

関連論文リスト