Fugu-MT 論文翻訳(概要): Utilizing Large Language Models for Machine Learning Explainability

論文の概要: Utilizing Large Language Models for Machine Learning Explainability

arxiv url: http://arxiv.org/abs/2510.06912v1
Date: Wed, 08 Oct 2025 11:46:23 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-09 16:41:20.464127
Title: Utilizing Large Language Models for Machine Learning Explainability
Title（参考訳）: 機械学習説明可能性のための大規模言語モデルの利用
Authors: Alexandros Vassiliades, Nikolaos Polatidis, Stamatios Samaras, Sotiris Diplaris, Ignacio Cabrera Martin, Yannis Manolopoulos, Stefanos Vrochidis, Ioannis Kompatsiaris,
Abstract要約: 本研究では,機械学習(ML)ソリューションを自律的に生成する際の,大規模言語モデル(LLM)の説明可能性について検討する。最先端の3つのLCMは、ランダムフォレスト、XGBoost、マルチレイヤーパーセプトロン、ロング短期記憶ネットワークの4つの共通分類器のためのトレーニングパイプラインを設計するよう促される。生成したモデルは、SHAP(SHapley Additive exPlanations)を用いた予測性能(リコール、精度、F1スコア)と説明可能性の観点から評価される。
参考スコア（独自算出の注目度）: 37.31918138232927
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study explores the explainability capabilities of large language models (LLMs), when employed to autonomously generate machine learning (ML) solutions. We examine two classification tasks: (i) a binary classification problem focused on predicting driver alertness states, and (ii) a multilabel classification problem based on the yeast dataset. Three state-of-the-art LLMs (i.e. OpenAI GPT, Anthropic Claude, and DeepSeek) are prompted to design training pipelines for four common classifiers: Random Forest, XGBoost, Multilayer Perceptron, and Long Short-Term Memory networks. The generated models are evaluated in terms of predictive performance (recall, precision, and F1-score) and explainability using SHAP (SHapley Additive exPlanations). Specifically, we measure Average SHAP Fidelity (Mean Squared Error between SHAP approximations and model outputs) and Average SHAP Sparsity (number of features deemed influential). The results reveal that LLMs are capable of producing effective and interpretable models, achieving high fidelity and consistent sparsity, highlighting their potential as automated tools for interpretable ML pipeline generation. The results show that LLMs can produce effective, interpretable pipelines with high fidelity and consistent sparsity, closely matching manually engineered baselines.
Abstract（参考訳）: 本研究では,機械学習(ML)ソリューションを自律的に生成する際の,大規模言語モデル(LLM)の説明可能性について検討する。 2つの分類課題について検討する。一運転注意状況の予測に焦点を当てた二分分類問題、及び (II)酵母データセットに基づく多ラベル分類問題。最先端の3つのLCM(OpenAI GPT, Anthropic Claude, DeepSeek)は、Random Forest、XGBoost、Multilayer Perceptron、Long Short-Term Memory Networkの4つの共通分類器のためのトレーニングパイプラインを設計するよう促される。生成したモデルは、SHAP(SHapley Additive exPlanations)を用いて予測性能(リコール、精度、F1スコア)と説明可能性の観点から評価する。具体的には,平均SHAP密度(平均正方形誤差とモデル出力)と平均SHAP間隔(有意な特徴数)を測定する。その結果、LLMは有効かつ解釈可能なモデルを生成することができ、高い忠実度と一貫した間隔を実現し、解釈可能なMLパイプライン生成のための自動化ツールとしての可能性を強調した。その結果,LLMは高い忠実度と一貫した間隔を持ち,手動で設計したベースラインと密接に一致した,効果的で解釈可能なパイプラインを生成することができることがわかった。

論文の概要: Utilizing Large Language Models for Machine Learning Explainability

関連論文リスト