Fugu-MT 論文翻訳(概要): ViScratch: Using Large Language Models and Gameplay Videos for Automated Feedback in Scratch

論文の概要: ViScratch: Using Large Language Models and Gameplay Videos for Automated Feedback in Scratch

arxiv url: http://arxiv.org/abs/2509.11065v1
Date: Sun, 14 Sep 2025 03:12:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-16 17:26:22.892661
Title: ViScratch: Using Large Language Models and Gameplay Videos for Automated Feedback in Scratch
Title（参考訳）: ViScratch:スクラッチにおけるフィードバック自動化のための大規模言語モデルとゲームプレイビデオの利用
Authors: Yuan Si, Daming Li, Hanyuan Shi, Jialu Zhang,
Abstract要約: 本稿では,Scratchのマルチモーダルフィードバック生成システムであるViScratchを紹介する。視覚言語モデルはまず、視覚症状をコード構造と整列させ、1つの重要な問題を特定する。実世界のScratchプロジェクトでのViScratchの評価を,最先端のLCMベースのツールや人間テスターに対して行った。
参考スコア（独自算出の注目度）: 1.9532610005311957
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Block-based programming environments such as Scratch are increasingly popular in programming education, in particular for young learners. While the use of blocks helps prevent syntax errors, semantic bugs remain common and difficult to debug. Existing tools for Scratch debugging rely heavily on predefined rules or user manual inputs, and crucially, they ignore the platform's inherently visual nature. We introduce ViScratch, the first multimodal feedback generation system for Scratch that leverages both the project's block code and its generated gameplay video to diagnose and repair bugs. ViScratch uses a two-stage pipeline: a vision-language model first aligns visual symptoms with code structure to identify a single critical issue, then proposes minimal, abstract syntax tree level repairs that are verified via execution in the Scratch virtual machine. We evaluate ViScratch on a set of real-world Scratch projects against state-of-the-art LLM-based tools and human testers. Results show that gameplay video is a crucial debugging signal: ViScratch substantially outperforms prior tools in both bug identification and repair quality, even without access to project descriptions or goals. This work demonstrates that video can serve as a first-class specification in visual programming environments, opening new directions for LLM-based debugging beyond symbolic code alone.
Abstract（参考訳）: Scratchのようなブロックベースのプログラミング環境は、プログラミング教育、特に若い学習者にはますます人気がある。ブロックの使用は構文エラーを防ぐのに役立つが、セマンティックなバグは一般的であり、デバッグは困難である。既存のScratchデバッグツールは、事前に定義されたルールやユーザ手入力に大きく依存しており、プラットフォーム固有の視覚的性質を無視している。我々は,プロジェクトのブロックコードと生成したゲームプレイビデオの両方を利用して,バグの診断と修復を行う,Scratchの最初のマルチモーダルフィードバック生成システムであるViScratchを紹介する。 ViScratchは2段階のパイプラインを使用する。視覚言語モデルはまず、視覚症状とコード構造を整列して、単一の重要な問題を特定し、次に、Scratch仮想マシンの実行によって検証される最小限の抽象構文ツリーレベルの修復を提案する。実世界のScratchプロジェクトでのViScratchの評価を,最先端のLCMベースのツールや人間テスターに対して行った。 ViScratchは、プロジェクトの記述や目標にアクセスしなくても、バグ識別と修復品質の両方において、以前のツールよりも大幅に優れています。この研究は、ビデオが視覚的プログラミング環境における第一級の仕様として機能し、シンボリックコードだけでなく、LLMベースのデバッグのための新しい方向を開くことを実証している。

論文の概要: ViScratch: Using Large Language Models and Gameplay Videos for Automated Feedback in Scratch

関連論文リスト