Fugu-MT 論文翻訳(概要): CODE-GEN: A Human-in-the-Loop RAG-Based Agentic AI System for Multiple-Choice Question Generation

論文の概要: CODE-GEN: A Human-in-the-Loop RAG-Based Agentic AI System for Multiple-Choice Question Generation

arxiv url: http://arxiv.org/abs/2604.03926v1
Date: Sun, 05 Apr 2026 01:37:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:18.838138
Title: CODE-GEN: A Human-in-the-Loop RAG-Based Agentic AI System for Multiple-Choice Question Generation
Title（参考訳）: CODE-GEN:マルチコース質問生成のための対人RAGエージェントAIシステム
Authors: Xiaojing Duan, Frederick Nwanganga, Chaoli Wang,
Abstract要約: 我々は、コンテキスト整合型質問を生成するために、RAGベースのエージェントAIシステムであるCODE-GENを提案する。 CODE-GENはエージェントAIアーキテクチャを採用しており、ジェネレータはコース固有の学習目的に沿った複数選択のコーディング理解質問を生成する。 CODE-GENの有効性を評価するために,288のAI生成質問を判定した6人の被験者・専門家(SME)による評価調査を行った。
参考スコア（独自算出の注目度）: 4.707083459088333
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present CODE-GEN, a human-in-the-Loop, retrieval-augmented generation (RAG)-based agentic AI system for generating context-aligned multiple-choice questions to develop student code reasoning and comprehension abilities. CODE-GEN employs an agentic AI architecture in which a Generator agent produces multiple-choice coding comprehension questions aligned with course-specific learning objectives, while a Validator agent independently assesses content quality across seven pedagogical dimensions. Both agents are augmented with specialized tools that enhance computational accuracy and verify code outputs. To evaluate the effectiveness of CODE-GEN, we conducted an evaluation study involving six human subject-matter experts (SMEs) who judged 288 AI-generated questions. The SMEs produced a total of 2,016 human-AI rating pairs, indicating agreement or disagreement with the assessments of Validator, along with 131 instances of qualitative feedback. Analyses of SME judgments show strong system performance, with human-validated success rates ranging from 79.9% to 98.6% across the seven pedagogical dimensions. The analysis of qualitative feedback reveals that CODE-GEN achieves high reliability on dimensions well suited to computational verification and explicit criteria matching, including question clarity, code validity, concept alignment, and correct answer validity. In contrast, human expertise remains essential for dimensions requiring deeper instructional judgment, such as designing pedagogically meaningful distractors and providing high-quality feedback that reinforces understanding. These findings inform the strategic allocation of human and AI effort in AI-assisted educational content generation.
Abstract（参考訳）: CODE-GENは、学習者のコード推論と理解能力を開発するために、コンテキスト整合型複数選択質問を生成するためのRAGベースのエージェントAIシステムである。 CODE-GENは、ジェネレータエージェントがコース固有の学習目的に沿った複数選択コーディング理解質問を生成するエージェントAIアーキテクチャを使用し、バリケータエージェントは、7つの教育次元にわたるコンテンツ品質を独立して評価する。どちらのエージェントも、計算精度を高め、コード出力を検証する特別なツールで強化されている。 CODE-GENの有効性を評価するために,288のAI生成質問を判定した6人の被験者・専門家(SME)による評価調査を行った。 SMEは合計2,016対の人間とAIのレーティングペアを作成し、質的なフィードバックの131例とともに、バリケータの評価と一致または一致しなかった。中小企業の判断の分析では、7つの学区で79.9%から98.6%の人為的な成功率で、強いシステム性能を示している。定性的フィードバックの分析により、CODE-GENは、疑問の明確さ、コードの有効性、概念の整合性、正解の正解性など、計算的検証や明示的な基準マッチングに適した次元で高い信頼性を達成できることが明らかになった。対照的に、人間の専門知識は、教育的に意味のある気晴らしを設計したり、理解を強化するための高品質なフィードバックを提供するなど、より深い教育的判断を必要とする次元において不可欠である。これらの知見は、AI支援教育コンテンツ生成における人間とAIの取り組みの戦略的割り当てを示唆している。

論文の概要: CODE-GEN: A Human-in-the-Loop RAG-Based Agentic AI System for Multiple-Choice Question Generation

関連論文リスト