Fugu-MT 論文翻訳(概要): Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning

論文の概要: Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning

arxiv url: http://arxiv.org/abs/2509.04059v1
Date: Thu, 04 Sep 2025 09:42:17 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-05 20:21:10.120774
Title: Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning
Title（参考訳）: 評価・強化学習のための楽譜の合成
Authors: Zhilin Wang, Zhe Yang, Yun Luo, Yafu Li, Haoran Zhang, Runzhe Zhan, Derek F. Wong, Jizhe Zhou, Yu Cheng,
Abstract要約: 本稿では,音楽理論に基づくシート音楽問題の合成について提案する。テキストと視覚の両方で検証可能なシート音楽質問を生成するデータ合成フレームワークを提案する。 RLVRの合成データを活用することで、Qwen3-8B-BaseとQwen2.5-VL-InstructはSSMR-Benchの改良を実現した。
参考スコア（独自算出の注目度）: 58.854546532387296
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Enhancing the ability of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) to interpret sheet music is a crucial step toward building AI musicians. However, current research lacks both evaluation benchmarks and training data for sheet music reasoning. To address this, we propose the idea of synthesizing sheet music problems grounded in music theory, which can serve both as evaluation benchmarks and as training data for reinforcement learning with verifiable rewards (RLVR). We introduce a data synthesis framework that generates verifiable sheet music questions in both textual and visual modalities, leading to the Synthetic Sheet Music Reasoning Benchmark (SSMR-Bench) and a complementary training set. Evaluation results on SSMR-Bench show the importance of models' reasoning abilities in interpreting sheet music. At the same time, the poor performance of Gemini 2.5-Pro highlights the challenges that MLLMs still face in interpreting sheet music in a visual format. By leveraging synthetic data for RLVR, Qwen3-8B-Base and Qwen2.5-VL-Instruct achieve improvements on the SSMR-Bench. Besides, the trained Qwen3-8B-Base surpasses GPT-4 in overall performance on MusicTheoryBench and achieves reasoning performance comparable to GPT-4 with the strategies of Role play and Chain-of-Thought. Notably, its performance on math problems also improves relative to the original Qwen3-8B-Base. Furthermore, our results show that the enhanced reasoning ability can also facilitate music composition. In conclusion, we are the first to propose the idea of synthesizing sheet music problems based on music theory rules, and demonstrate its effectiveness not only in advancing model reasoning for sheet music understanding but also in unlocking new possibilities for AI-assisted music creation.
Abstract（参考訳）: 楽譜を解釈する大規模言語モデル(LLM)とマルチモーダル大規模言語モデル(MLLM)の能力を高めることは、AIミュージシャンを構築するための重要なステップである。しかし,本研究では,楽譜推論のための評価ベンチマークとトレーニングデータの両方が欠落している。そこで本研究では,音楽理論に基づくシート音楽問題の合成について提案する。これは評価ベンチマークと,検証可能な報酬付き強化学習(RLVR)のためのトレーニングデータとして機能する。テキストと視覚の両方で検証可能なシート音楽質問を生成するためのデータ合成フレームワークを導入し,Synthetic Sheet Music Reasoning Benchmark (SSMR-Bench) と相補的なトレーニングセットを実現する。 SSMR-Benchの評価結果は,楽譜の解釈におけるモデルの推論能力の重要性を示している。同時に、Gemini 2.5-Proの貧弱なパフォーマンスは、MLLMが依然としてシートミュージックを視覚的に解釈する上で直面する課題を強調している。 RLVRの合成データを活用することで、Qwen3-8B-BaseとQwen2.5-VL-InstructはSSMR-Benchの改良を実現した。さらに、トレーニングされたQwen3-8B-Baseは、MusicTheoryBenchの全体的なパフォーマンスでGPT-4を上回っ、ロールプレイとChain-of-Thoughtの戦略でGPT-4に匹敵するパフォーマンスを達成した。特に、数学上の問題のパフォーマンスは、オリジナルのQwen3-8Bベースと比較して改善されている。さらに,本研究の結果から,聴取能力の向上が音楽の作曲を促進できることが示唆された。結論として,我々は,楽譜理論のルールに基づく楽譜問題を合成するアイデアを初めて提案し,楽譜理解のためのモデル推論の進展だけでなく,AIによる楽譜作成の新たな可能性の開放にも有効であることを示す。

論文の概要: Synthesizing Sheet Music Problems for Evaluation and Reinforcement Learning

関連論文リスト