Fugu-MT 論文翻訳(概要): TABFAIRGDT: A Fast Fair Tabular Data Generator using Autoregressive Decision Trees

論文の概要: TABFAIRGDT: A Fast Fair Tabular Data Generator using Autoregressive Decision Trees

arxiv url: http://arxiv.org/abs/2509.19927v1
Date: Wed, 24 Sep 2025 09:35:52 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-25 20:53:19.75736
Title: TABFAIRGDT: A Fast Fair Tabular Data Generator using Autoregressive Decision Trees
Title（参考訳）: TABFAIRGDT:自己回帰決定木を用いた高速フェアタブラルデータジェネレータ
Authors: Emmanouil Panagiotou, Benoît Ronval, Arjun Roy, Ludwig Bothmann, Bernd Bischl, Siegfried Nijssen, Eirini Ntoutsi,
Abstract要約: 本稿では,自己回帰決定木を用いた公正な合成データ生成手法であるTABFAIRGDTを紹介する。ベンチマークフェアネスデータセット上でTABFAIRGDTを評価し,SOTA(State-of-the-art)深部生成モデルよりも優れていることを示す。注目すべきなのは、TABFAIRGDTは、さまざまなデータセットサイズで、最速のSOTAベースラインよりも平均72%のスピードアップを実現していることだ。
参考スコア（独自算出の注目度）: 11.0044761900691
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensuring fairness in machine learning remains a significant challenge, as models often inherit biases from their training data. Generative models have recently emerged as a promising approach to mitigate bias at the data level while preserving utility. However, many rely on deep architectures, despite evidence that simpler models can be highly effective for tabular data. In this work, we introduce TABFAIRGDT, a novel method for generating fair synthetic tabular data using autoregressive decision trees. To enforce fairness, we propose a soft leaf resampling technique that adjusts decision tree outputs to reduce bias while preserving predictive performance. Our approach is non-parametric, effectively capturing complex relationships between mixed feature types, without relying on assumptions about the underlying data distributions. We evaluate TABFAIRGDT on benchmark fairness datasets and demonstrate that it outperforms state-of-the-art (SOTA) deep generative models, achieving better fairness-utility trade-off for downstream tasks, as well as higher synthetic data quality. Moreover, our method is lightweight, highly efficient, and CPU-compatible, requiring no data pre-processing. Remarkably, TABFAIRGDT achieves a 72% average speedup over the fastest SOTA baseline across various dataset sizes, and can generate fair synthetic data for medium-sized datasets (10 features, 10K samples) in just one second on a standard CPU, making it an ideal solution for real-world fairness-sensitive applications.
Abstract（参考訳）: 機械学習における公平性を保証することは、モデルがトレーニングデータからバイアスを継承することが多いため、依然として大きな課題である。生成モデルは、ユーティリティを保ちながら、データレベルでバイアスを軽減するための有望なアプローチとして最近登場した。しかし、多くの人は、単純なモデルが表データに対して非常に効果的であるという証拠にもかかわらず、深いアーキテクチャに依存している。本研究では,自動回帰決定木を用いた公正な合成表データ生成手法であるTABFAIRGDTを紹介する。公平性を確保するため,予測性能を維持しつつ,決定木出力を調整し,バイアスを低減するソフトリーフ再サンプリング手法を提案する。我々のアプローチは非パラメトリックであり、基礎となるデータ分布に関する仮定に頼ることなく、混合特徴型間の複雑な関係を効果的に捉える。我々は、ベンチマークフェアネスデータセット上でTABFAIRGDTを評価し、最新のSOTA(deep-of-the-art)生成モデルよりも優れ、下流タスクに対するフェアネス・ユーティリティのトレードオフが向上し、合成データ品質が向上することを示した。さらに,本手法は軽量で高効率でCPU互換であり,データ前処理は不要である。注目すべきなのは、TABFAIRGDTは、さまざまなデータセットサイズにわたる最速のSOTAベースラインよりも平均72%のスピードアップを実現し、標準的なCPU上で1秒で、中規模のデータセット(10つの特徴、10Kサンプル)の公正な合成データを生成することができ、現実の公平性に敏感なアプリケーションに理想的なソリューションとなることだ。

論文の概要: TABFAIRGDT: A Fast Fair Tabular Data Generator using Autoregressive Decision Trees

関連論文リスト