Fugu-MT 論文翻訳(概要): MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling

論文の概要: MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling

arxiv url: http://arxiv.org/abs/2605.12154v1
Date: Tue, 12 May 2026 14:07:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 21:48:56.901531
Title: MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling
Title（参考訳）: MM-OptBench:マルチモーダル最適化モデリングのためのソルバーグラウンドベンチマーク
Authors: Zhong Li, Qi Huang, Yuxuan Zhu, Mohammad Mohammadi Amiri, Niki van Stein, Thomas Bäck, Matthijs van Leeuwen, Zaiwen Wen, Lincen Yang,
Abstract要約: テキストと視覚の問題仕様から数学的定式化と実行可能なソルバコードの両方を構築する必要があるベンチマーク設定であるマルチモーダル最適化モデルを導入する。フレームワークをMM-OptBenchとしてインスタンス化し,6つの最適化ファミリ,26のサブカテゴリ,3つの構造的難易度にまたがる780のソルバ検証インスタンスをベンチマークした。
参考スコア（独自算出の注目度）: 18.671643433145846
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Optimization modeling translates real decision-making problems into mathematical optimization models and solver-executable implementations. Although language models are increasingly used to generate optimization formulations and solver code, existing benchmarks are almost entirely text-only. This omits many optimization-modeling tasks that arise in operational practice, where requirements are described in text but instance information is conveyed through visual artifacts such as tables, graphs, maps, schedules, and dashboards. We introduce multimodal optimization modeling, a benchmark setting in which models must construct both a mathematical formulation and executable solver code from a text-and-visual problem specification. To evaluate this setting, we develop a solver-grounded framework that generates structured optimization instances, verifies each with an exact solver, and builds both the model-facing inputs and hidden reference files from the same verified source. We instantiate the framework as MM-OptBench, a benchmark of 780 solver-verified instances spanning 6 optimization families, 26 subcategories, and 3 structural difficulty levels. We evaluate 9 multimodal large language models (MLLMs), including 6 frontier general-purpose models and 3 math-specialized models, with aggregate, family-level, difficulty-level, and failure-mode analyses. The results show that the task remains far from solved: the best two models reach 52.1% and 51.3% pass@1, while on average across the six general-purpose MLLMs, pass@1 is 43.4% on easy instances and 15.9% on hard instances. All three math-specialized MLLMs solve 0/780 instances. Failure attribution shows that errors arise both when extracting instance data from text and visuals and when turning extracted data into solver-correct formulations and code. MM-OptBench provides a testbed for solver-grounded, decision-oriented multimodal intelligence.
Abstract（参考訳）: 最適化モデリングは、実際の意思決定問題を数学的最適化モデルとソルバ実行可能な実装に変換する。言語モデルは最適化の定式化やソルバコードの生成にますます使われているが、既存のベンチマークはほとんど完全にテキストのみである。要件はテキストで記述されるが、インスタンス情報はテーブル、グラフ、マップ、スケジュール、ダッシュボードといった視覚的な成果物を通じて伝達される。テキストと視覚の問題仕様から数学的定式化と実行可能なソルバコードの両方を構築する必要があるベンチマーク設定であるマルチモーダル最適化モデルを導入する。この設定を評価するために、構造化最適化インスタンスを生成し、それぞれを正確なソルバで検証するソルバグラウンドフレームワークを開発し、モデル対応インプットと隠れ参照ファイルの両方を同一の検証元から構築する。フレームワークをMM-OptBenchとしてインスタンス化し,6つの最適化ファミリ,26のサブカテゴリ,3つの構造的難易度にまたがる780のソルバ検証インスタンスをベンチマークした。我々は,6つのフロンティア汎用モデルと3つの数学特化モデルを含む9つの多モード大言語モデル (MLLM) を評価する。最高の2つのモデルは52.1%と51.3%のpass@1に達し、平均して6つの汎用MLLMでpass@1は43.4%、ハードインスタンスでは15.9%である。 3つの数学特化MLLMは0/780のインスタンスを解く。フェール属性は、テキストとビジュアルからインスタンスデータを抽出したときと、抽出したデータをソルバの正確な定式化とコードに変換するときの両方にエラーが発生することを示している。 MM-OptBenchは、問題解決と意思決定指向のマルチモーダルインテリジェンスのためのテストベッドを提供する。

論文の概要: MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling

関連論文リスト