Fugu-MT 論文翻訳(概要): Bias in Filter Feature Selection Evaluation: A Meta-Analysis of Datasets, Baselines, and Experimental Design Choices

論文の概要: Bias in Filter Feature Selection Evaluation: A Meta-Analysis of Datasets, Baselines, and Experimental Design Choices

arxiv url: http://arxiv.org/abs/2606.07068v1
Date: Fri, 05 Jun 2026 09:06:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-08 14:33:29.657091
Title: Bias in Filter Feature Selection Evaluation: A Meta-Analysis of Datasets, Baselines, and Experimental Design Choices
Title（参考訳）: フィルタ特徴選択評価におけるバイアス:データセット,ベースライン,実験設計選択のメタ分析
Authors: Malick Ebiele, Malika Bendechache, Rob Brennan,
Abstract要約: 近年のディープラーニング(DL)と機械学習(ML)のデータ評価は,新しい手法の評価が意識的にも無意識的にも偏っていることを示唆している。本研究の目的は, 評価に影響を与える因子を同定し, バイアスのエントリーポイントを構成する因子を明らかにすることである。この分析は、FFS研究の検証方法の反映を提供し、プロセス全体で学んだ教訓を強調し、将来のFFS評価のための5つのエビデンスベースの勧告を提供する。
参考スコア（独自算出の注目度）: 0.22940141855172028
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Background: Since 1990 many feature selection methods have been proposed across heterogeneous applications. To validate the usefulness of a new method, it needs to be compared against at least one baseline method from the existing literature on a feature selection task using at least one dataset. Recent developments in tabular Deep Learning (DL) and data valuation in Machine Learning (ML) suggest that the evaluation of new methods, algorithms, and models may be consciously or unconsciously biased. We hypothesise that a similar trend exists in feature selection (FS), particularly in filter feature selection (FFS). The aim of this study is therefore to examine FFS studies to identify factors that influence the evaluation and that might consist entry point for biases in order to recommend stronger principles for FFS evaluation. Methods: We analyse a sample of 28 high profile FFS studies published between 1994 and 2025. The analysis provides reflections on how to examine FFS studies, highlights lessons learned throughout the process, and gives five evidence-based recommendations for future FFS evaluation. Results: Multivariate Linear Regression analysis achieved a score of $R^2=0.33$. It means that 33% of the variance in the performance of new methods against chosen baselines (win rate) is explained by the number of datasets (#Datasets), the number of baselines (#Baselines), and the number of new methods (#NewMethods). Discussion: $R^2=0.33$ is considered medium explanation; which is promising given that this is the first such study. The medium explanation result is due to the fact that win rate is influenced by additional factors such as the maturity of the feature selection domain, the type of datasets and baselines, and the simplicity of the regression model used to explain the relationship.
Abstract（参考訳）: 背景: 1990年以降、異種アプリケーションにまたがって多くの特徴選択法が提案されている。新たな手法の有用性を検証するためには,少なくとも1つのデータセットを用いた特徴選択タスクにおいて,既存文献の少なくとも1つのベースライン手法と比較する必要がある。近年の表型ディープラーニング(DL)と機械学習(ML)のデータ評価は,新しい手法,アルゴリズム,モデルの評価が意識的あるいは無意識に偏っていることを示唆している。同様の傾向が特徴選択(FS)、特にフィルタ特徴選択(FFS)に存在すると仮定する。本研究の目的は,FFS評価に影響を及ぼし,バイアスのエントリーポイントを構成する可能性のある因子を同定し,FFS評価のより強力な原則を提案することである。方法: 1994年から2025年にかけて発行された28種類の高プロファイルFSSのサンプルを分析した。この分析は、FFS研究の検証方法の反映を提供し、プロセス全体で学んだ教訓を強調し、将来のFFS評価のための5つのエビデンスベースの勧告を提供する。結果: 多変量線形回帰解析はR^2=0.33$のスコアを得た。つまり、選択されたベースライン(ウィンレート)に対する新しいメソッドのパフォーマンスの33%は、データセットの数(#Datasets)、ベースラインの数(#Baseliness)、新しいメソッドの数(#NewMethods)によって説明される。議論:$R^2=0.33$は中程度の説明と見なされる。中間的説明結果は,特徴選択領域の成熟度,データセットの種類,ベースラインの種類,関係を説明するための回帰モデルの単純さなど,追加的な要因による。

論文の概要: Bias in Filter Feature Selection Evaluation: A Meta-Analysis of Datasets, Baselines, and Experimental Design Choices

関連論文リスト