Fugu-MT 論文翻訳(概要): Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems

論文の概要: Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems

arxiv url: http://arxiv.org/abs/2606.04816v1
Date: Wed, 03 Jun 2026 12:39:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 20:44:18.754052
Title: Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems
Title（参考訳）: 客観的等価性を超えて:車道問題に対するLLMに基づく最適化モデルのための制約注入
Authors: Xizi Luo, Changhong He, Dongdong Geng, Chenggong Shi, Yu Mei,
Abstract要約: 本研究では,無声性制約の排除を明らかにするために,急激な過剰制約と一拘束性違反プローブを露呈する制約注入法を提案する。自然言語のVRPシナリオをGurobiスクリプトに変換する8BのエンドツーエンドモデルであるVRPCoderを開発した。 VRPCoder-GRPOは93%の平均パス@1、Geminiは3つのベンチマークで3.1-Proプレビューを上回り、Claude-Sonnet-4.5を28ポイント上回り、LLMを78ポイント上回る。
参考スコア（独自算出の注目度）: 3.024294453714127
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) increasingly translate natural-language optimization problems into executable solver code. Yet for constraint-dense operations research (OR) problems, existing data-filtering and training pipelines largely rely on objective-equivalence signals such as differential testing and answer agreement, which a program can pass while adding spurious constraints or silently omitting required ones, whenever those constraints are non-binding on the tested instance. We propose constraint injection, which uses feasible probes to expose spurious over-constraint and one-constraint-violating probes to reveal silent constraint omission. Combined with differential testing, it forms a dual verifier. We instantiate and evaluate it on vehicle routing problems (VRPs), a representative constraint-dense combinatorial optimization testbed with coupled operational constraints. We develop VRPCoder, an 8B end-to-end model that translates natural-language VRP scenarios into Gurobi scripts, together with an expert-verified VRP benchmark suite covering 21 variants. The verifier is reused as a rejection-sampling filter during data synthesis and as a per-rollout reward in group relative policy optimization (GRPO). Across four VRP benchmarks, VRPCoder-GRPO reaches 93\% average Pass@1, outperforms Gemini-3.1-Pro Preview on three benchmarks, exceeds Claude-Sonnet-4.5 by 28 average points, and surpasses prior OR-LLMs by 78 average points.
Abstract（参考訳）: 大規模言語モデル (LLMs) は、自然言語の最適化問題を実行可能なソルバコードに変換する傾向にある。しかし、制約密度オペレーション研究(OR)問題では、既存のデータフィルタリングとトレーニングパイプラインは、テストインスタンスに制約が非結合である場合、プログラムが急激な制約を追加したり、必要な制約を静かに省略したりしながら通過できる差分テストや応答合意のような、客観的に等価な信号に大きく依存している。提案する制約注入法は, 実効性プローブを用いて, 突発性過剰制約と一拘束性違反プローブを露呈し, 無声性制約除去を明らかにする。微分テストと組み合わせて、二重検証器を形成する。車両ルーティング問題 (VRPs) では, 協調的な運転制約を伴い, 代表的な制約密度組合せ最適化手法として, 車両ルーティング問題 (VRPs) をインスタンス化し評価する。自然言語のVRPシナリオをGurobiスクリプトに変換する8BのエンドツーエンドモデルであるVRPCoderと,21種類のVRPベンチマークスイートを開発した。検証器は、データ合成中に拒絶サンプリングフィルタとして再利用され、グループ相対ポリシー最適化(GRPO)におけるロールアウト当たりの報酬として再利用される。 4つのVRPベンチマークで、VRPCoder-GRPOは平均93%のPass@1に達し、3つのベンチマークでGemini-3.1-Pro Previewを上回り、Claude-Sonnet-4.5を28ポイント上回り、OR-LLMを78ポイント上回る。

論文の概要: Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems

関連論文リスト