Fugu-MT 論文翻訳(概要): Error Understanding in Program Code With LLM-DL for Multi-label Classification

論文の概要: Error Understanding in Program Code With LLM-DL for Multi-label Classification

arxiv url: http://arxiv.org/abs/2603.25005v1
Date: Thu, 26 Mar 2026 04:05:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-27 20:52:48.089169
Title: Error Understanding in Program Code With LLM-DL for Multi-label Classification
Title（参考訳）: マルチラベル分類のためのLLM-DLを用いたプログラムコードの誤り理解
Authors: Md Faizul Ibne Amin, Yutaka Watanobe, Md. Mostafizer Rahman, Daniel M. Muepu, Md. Shahajada Mia,
Abstract要約: 大規模言語モデル(LLM)は、自然言語の理解と生成タスクにおいて顕著な能力を示している。本研究では,微調整 LLM を利用したソースコードの多ラベル誤り分類フレームワークを提案する。この作業は、自動化されたコードフィードバックのためのインテリジェントでスケーラブルなツール開発の基礎を築いた。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Programming is a core skill in computer science and software engineering (SE), yet identifying and resolving code errors remains challenging for both novice and experienced developers. While Large Language Models (LLMs) have shown remarkable capabilities in natural language understanding and generation tasks, their potential in domain-specific, complex scenarios, such as multi-label classification (MLC) of programming errors, remains underexplored. Recognizing this less-explored area, this study proposes a multi-label error classification (MLEC) framework for source code that leverages fine-tuned LLMs, including CodeT5-base, GraphCodeBERT, CodeT5+, UniXcoder, RoBERTa, PLBART, and CoTexT. These LLMs are integrated with deep learning (DL) architectures such as GRU, LSTM, BiLSTM, and BiLSTM with an additive attention mechanism (BiLSTM-A) to capture both syntactic and semantic features from a real-world student-written Python code error dataset. Extensive experiments across 32 model variants, optimized using Optuna-based hyperparameter tuning, have been evaluated using comprehensive multi-label metrics, including average accuracy, macro and weighted precision, recall, F1-score, exact match accuracy, One-error, Hamming loss, Jaccard similarity, and ROC-AUC (micro, macro, and weighted). Results show that the CodeT5+\_GRU model achieved the strongest performance, with a weighted F1-score of 0.8243, average accuracy of 91.84\%, exact match accuracy of 53.78\%, Hamming loss of 0.0816, and One error of 0.0708. These findings confirm the effectiveness of combining pretrained semantic encoders with efficient recurrent decoders. This work lays the foundation for developing intelligent, scalable tools for automated code feedback, with potential applications in programming education (PE) and broader SE domains.
Abstract（参考訳）: プログラミングはコンピュータサイエンスとソフトウェア工学(SE)の中核的なスキルであるが、初心者と経験豊富な開発者の両方にとって、コードエラーの特定と解決は依然として困難である。大規模言語モデル(LLM)は、自然言語の理解と生成タスクにおいて顕著な能力を示してきたが、プログラムエラーのマルチラベル分類(MLC)のような、ドメイン固有の複雑なシナリオにおいて、その可能性はまだ未定である。そこで本研究では,CodeT5-base,GraphCodeBERT,CodeT5+,UniXcoder,RoBERTa,PLBART,CoTexTなどの微調整 LLM を利用したソースコードのマルチラベル誤り分類(MLEC)フレームワークを提案する。これらのLCMは、GRU、LSTM、BiLSTM、BiLSTMといったディープラーニング(DL)アーキテクチャと統合されており、実世界の学生が書いたPythonコードエラーデータセットから構文と意味の両方をキャプチャする追加の注意機構(BiLSTM-A)を備えている。平均精度、マクロおよび重み付き精度、リコール、F1スコア、正確なマッチング精度、ワンエラー、ハミングロス、ジャカード類似度、ROC-AUC(マイクロ、マクロ、重み付き)など、32種類のモデル変異体にわたる広範囲な実験が、Optunaベースのハイパーパラメータチューニングを用いて評価されている。その結果、CodeT5+\_GRUモデルはF1スコア0.8243、平均精度91.84\%、正確なマッチング精度53.78\%、ハミング損失0.0816、一誤差0.0708という最強のパフォーマンスを達成した。これらの結果から,事前学習したセマンティックエンコーダと効率的な再帰復号器の併用の有効性が確認された。この研究は、自動化されたコードフィードバックのためのインテリジェントでスケーラブルなツールの開発の基礎を築いた。

論文の概要: Error Understanding in Program Code With LLM-DL for Multi-label Classification

関連論文リスト