Fugu-MT 論文翻訳(概要): From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation

論文の概要: From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation

arxiv url: http://arxiv.org/abs/2508.10118v2
Date: Mon, 18 Aug 2025 09:54:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-19 12:43:44.903324
Title: From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation
Title（参考訳）: Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation
Authors: Ke Niu, Haiyang Yu, Zhuofan Chen, Mengyang Zhao, Teng Fu, Bin Li, Xiangyang Xue,
Abstract要約: CADモデリングコード生成のためのマルチモーダルChain-of-Thoughtガイド強化学習フレームワークCAD-RLを提案する。本手法は,3つのタスク固有報酬を用いた目標駆動型強化学習ポストトレーニングとコールドスタートを組み合わせた。 CAD-RLは、推論品質、出力精度、コード実行可能性を大幅に改善することを示した。
参考スコア（独自算出の注目度）: 47.67703214044401
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computer-Aided Design (CAD) plays a vital role in engineering and manufacturing, yet current CAD workflows require extensive domain expertise and manual modeling effort. Recent advances in large language models (LLMs) have made it possible to generate code from natural language, opening new opportunities for automating parametric 3D modeling. However, directly translating human design intent into executable CAD code remains highly challenging, due to the need for logical reasoning, syntactic correctness, and numerical precision. In this work, we propose CAD-RL, a multimodal Chain-of-Thought (CoT) guided reinforcement learning post training framework for CAD modeling code generation. Our method combines CoT-based Cold Start with goal-driven reinforcement learning post training using three task-specific rewards: executability reward, geometric accuracy reward, and external evaluation reward. To ensure stable policy learning under sparse and high-variance reward conditions, we introduce three targeted optimization strategies: Trust Region Stretch for improved exploration, Precision Token Loss for enhanced dimensions parameter accuracy, and Overlong Filtering to reduce noisy supervision. To support training and benchmarking, we release ExeCAD, a noval dataset comprising 16,540 real-world CAD examples with paired natural language and structured design language descriptions, executable CADQuery scripts, and rendered 3D models. Experiments demonstrate that CAD-RL achieves significant improvements in reasoning quality, output precision, and code executability over existing VLMs.
Abstract（参考訳）: CAD(Computer-Aided Design)は、エンジニアリングと製造において重要な役割を担っているが、現在のCADワークフローは広範なドメインの専門知識と手動のモデリング作業を必要とする。大規模言語モデル(LLM)の最近の進歩により、自然言語からコードを生成することができ、パラメトリックな3Dモデリングを自動化する新たな機会が開かれた。しかし、論理的推論、構文的正確性、数値的精度の必要性から、人間の設計意図を直接実行可能なCADコードに変換することは、依然として非常に困難である。本研究では,CADモデリングコード生成のための強化学習ポストトレーニングフレームワークであるCAD-RLを提案する。本手法は,CoTをベースとしたコールドスタートと,実行可能性報酬,幾何精度報酬,外部評価報酬の3つのタスク固有報酬を用いた目標駆動型強化学習ポストトレーニングを組み合わせる。スパースかつ高分散な報酬条件下での安定な政策学習を確保するため,探索改善のための信頼領域ストレッチ,パラメータの精度向上のための精密トークン損失,ノイズの低減を目的としたオーバーロングフィルタの3つの最適化戦略を導入する。トレーニングとベンチマークをサポートするために,16,540個の実世界のCADサンプルと,ペア型自然言語と構造化された設計言語記述,実行可能なCADQueryスクリプト,レンダリングされた3Dモデルからなる,NovalデータセットのExeCADをリリースする。 CAD-RL は,既存の VLM よりも品質,出力精度,コード実行性を大幅に向上することを示した。

論文の概要: From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation

関連論文リスト