Fugu-MT 論文翻訳(概要): TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks

論文の概要: TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks

arxiv url: http://arxiv.org/abs/2606.02384v1
Date: Mon, 01 Jun 2026 15:33:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:32.392489
Title: TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks
Title（参考訳）: TabPrep: タブラルベンチマークで機能エンジニアリングギャップを閉鎖する
Authors: Andrej Tschalzev, Nick Erickson, Yuyang Wang, Huzefa Rangwala, Stefan Lüdtke, Heiner Stuckenschmidt, Christian Bartelt,
Abstract要約: TabPrepは、3つの特定の構造データパターンをターゲットにした機能ジェネレータで構成される軽量な前処理パイプラインである。広く使われているモデルクラスの多くは、これらのパターンに予測可能な盲点を示し、体系的な特徴工学だけで新しいピーク性能を確立することができることを示す。 TabArenaベンチマーク全体を通じて、TabPrepをモデルトレーニングとチューニングに統合することで、ツリーベース、ニューラル、リニア、ファンデーションモデルのパフォーマンスが一貫して向上する。
参考スコア（独自算出の注目度）: 32.521087490144964
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Progress in tabular machine learning has largely focused on increasingly sophisticated model architectures. At the same time, feature engineering remains a critical yet underexplored component of real-world modeling pipelines that is entirely absent from modern benchmarks, which creates an unquantified evaluation gap. In this work, we introduce TabPrep, a lightweight preprocessing pipeline composed of feature generators that are carefully designed to target three specific structural data patterns. We show that many widely used model classes exhibit predictable blind spots to these patterns and that systematic feature engineering alone can establish new peak performance. Across the TabArena benchmark, integrating TabPrep into model training and tuning consistently improves performance for tree-based, neural, linear, and foundation models, often surpassing gains achieved by model-centric innovations alone. TabPrep outperforms previous automated feature engineering approaches in performance, efficiency, and applicability across datasets, enabling integration into large-scale benchmarks. By releasing TabPrep (see https://github.com/atschalz/tabprep), we enable researchers to integrate feature engineering into their benchmarking setup, filling a longstanding gap in tabular evaluations.
Abstract（参考訳）: 表形式の機械学習の進歩は、ますます洗練されたモデルアーキテクチャに重点を置いてきた。同時に、機能エンジニアリングは、現代的なベンチマークから完全に欠落している現実世界のモデリングパイプラインにおいて、重要で過小評価されていないコンポーネントであり続けている。本研究では,3つの構造データパターンを対象とする機能生成器で構成される軽量プリプロセッシングパイプラインであるTabPrepを紹介する。広く使われているモデルクラスの多くは、これらのパターンに予測可能な盲点を示し、体系的な特徴工学だけで新しいピーク性能を確立することができることを示す。 TabArenaベンチマーク全体では、TabPrepをモデルトレーニングとチューニングに統合することで、ツリーベース、ニューラル、リニア、ファンデーションモデルのパフォーマンスが一貫して向上し、モデル中心のイノベーションだけで達成されるゲインを上回ることがしばしばある。 TabPrepは、データセット間のパフォーマンス、効率、適用性において、以前の自動機能エンジニアリングアプローチよりも優れており、大規模なベンチマークとの統合を可能にしている。 TabPrep のリリース (https://github.com/atschalz/tabprep を参照) により、研究者は、機能エンジニアリングをベンチマーク設定に統合することができ、長期にわたるタブ評価のギャップを埋めることができます。

論文の概要: TabPrep: Closing the Feature Engineering Gap in Tabular Benchmarks

関連論文リスト