Fugu-MT 論文翻訳(概要): Inference-Time Refinement Closes the Synthetic-Real Gap in Tabular Diffusion

論文の概要: Inference-Time Refinement Closes the Synthetic-Real Gap in Tabular Diffusion

arxiv url: http://arxiv.org/abs/2605.06261v1
Date: Thu, 07 May 2026 13:37:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.85241
Title: Inference-Time Refinement Closes the Synthetic-Real Gap in Tabular Diffusion
Title（参考訳）: 推測時間リファインメントは喉頭拡散における合成リールギャップを閉鎖する
Authors: Eugenio Lomurno, Filippo Balzarini, Francesco Benelle, Francesca Pia Panaccione, Matteo Matteucci,
Abstract要約: 凍結したトレーニング済みのバックボーン上で動作可能な推論時間改善フレームワークを提案する。推論時間の改善は、1つのコンシューマグレードのGPU上で1～80分でリアルタイムユーティリティを超えます。
参考スコア（独自算出の注目度）: 8.745106905496282
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion-based generators set the current state of the art for synthetic tabular data. These methods approach but rarely exceed real-data utility, and closing this synthetic-real gap has so far been pursued exclusively at training time, via architectural advances, scaling, and retraining of monolithic generators. The inference-time alternative, i.e., refining the outputs of a pre-trained backbone with parameters left untouched, has remained largely unexplored for tabular synthesis. We introduce TARDIS (Tabular generation through Refinement, Distillation, and Inference-time Sampling), an inference-time refinement framework that operates on a frozen pre-trained backbone, configured per dataset by a Tree-structured Parzen Estimator search over score-level guidance during reverse diffusion, with each trial's objective set by an inner grid search over post-hoc sample selectors and an optional soft-label distillation step. The search space encodes a single mathematical pattern we name Bidirectional Chamfer Refinement (BCR): the symmetric Chamfer functional between synthetic and real samples is minimized both continuously, via a score-level gradient, and discretely, via batch-ranking post-generation. The per-dataset search recovers BCR-aligned configurations on most datasets, evidence for BCR as the dominant refinement pattern. Across 15 binary, multiclass, and regression benchmarks TARDIS achieves a median +8.6% downstream-task improvement over models trained on real data (95% CI [+3.3, +16.4], Wilcoxon p=0.016, 11/15 strict wins) and improves over the TabDiff backbone on all 15 datasets (mean +12.9%, p<10^-4), matching the backbone on manifold fidelity, diversity, and sample-level privacy. Inference-time refinement of a pre-trained tabular diffusion backbone reaches and exceeds real-data utility in 1 to 80 minutes on a single consumer-grade GPU.
Abstract（参考訳）: 拡散に基づくジェネレータは、合成表データに対して現在の状態を設定する。これらの手法は実際のデータユーティリティを超えることはめったにないが、この合成と実際のギャップを埋めることはこれまで、アーキテクチャの進歩、スケーリング、モノリシック発電機の再訓練を通じて、トレーニング時にのみ追求されてきた。推論時間(inference-time)の代替手段、すなわち、トレーニング済みのバックボーンの出力を未修正のパラメータで精製することは、表の合成のためにほとんど未探索のままである。 TARDIS(Tabular generation through Refinement, Distillation, and Inference-time Smpling)は、凍結した事前学習したバックボーン上で動作し、逆拡散中のスコアレベルのガイダンスをサーチする木構造パーゼン推定器によってデータセット毎に設定され、各試験の目的は、ポストホックサンプルセレクタ上のインナーグリッドサーチとオプションのソフトラベル蒸留ステップによって設定される。 BCR (Bidirectional Chamfer Refinement) と呼ばれる単一の数学的パターンを符号化する探索空間は、合成サンプルと実サンプルの対称なChamfer関数は、スコアレベルの勾配によって、またバッチレベルのポストジェネレーションによって、連続的に最小化される。データセットごとの検索は、ほとんどのデータセットでBCRに整列した構成を復元し、BCRが主流の精細化パターンであることを示す。 15のバイナリ、マルチクラス、レグレッションベンチマークにわたって、TARDISは、実際のデータでトレーニングされたモデル(95% CI [+3.3, +16.4], Wilcoxon p=0.016, 11/15 厳密な勝利)に対して、中央値+8.6%のダウンストリームタスク改善を実現し、全15データセット(平均+12.9%, p<10^-4)上のタブディフバックボーンを改善し、多様体の忠実さ、多様性、サンプルレベルのプライバシに関するバックボーンをマッチングする。トレーニング済みの表層拡散バックボーンの推測時間改善は、1つのコンシューマグレードGPU上で1～80分で実データユーティリティを超えている。

論文の概要: Inference-Time Refinement Closes the Synthetic-Real Gap in Tabular Diffusion

関連論文リスト