Fugu-MT 論文翻訳(概要): 3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience

論文の概要: 3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience

arxiv url: http://arxiv.org/abs/2604.08042v1
Date: Thu, 09 Apr 2026 09:47:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-10 18:34:05.850117
Title: 3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience
Title（参考訳）: 3DrawAgent: 初期のコントラスト体験による3Dの描画をLLMに教える
Authors: Hongcan Xiao, Xinyue Xiao, Yilin Wang, Yue Zhang, Yonggang Qi,
Abstract要約: 我々は3Dスケッチ生成のためのトレーニングフリーで言語駆動のフレームワークである3DrawAgentを紹介する。従来の2Dスケッチエージェントとは異なり,本手法は相対的経験最適化戦略を導入する。 3DrawAgentは多種多様なテキストプロンプトから複雑で一貫性のある3Dベジエスケッチを生成することができることを示す。
参考スコア（独自算出の注目度）: 17.17661155254756
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sketching in 3D space enables expressive reasoning about shape, structure, and spatial relationships, yet generating 3D sketches through natural language remains a major challenge. In this work, we introduce 3DrawAgent, a training-free, language-driven framework for 3D sketch generation that leverages large language models (LLMs) to sequentially draw 3D Bezier curves under geometric feedback. Unlike prior 2D sketch agents, our method introduces a relative experience optimization strategy that adapts the recently proposed Group Reward Policy Optimization (GRPO) paradigm. Instead of relying on explicit ground-truth supervision, we construct pairwise comparisons among generated sketches, with each pair consisting of a relatively better and a worse result based on CLIP-based perceptual rewards and LLM-based fine-grained qualitative assessment. These experiences are then used to iteratively refine the prior knowledge of 3D drawing, enabling black-box reinforcement of the model's 3D awareness. This design allows our model to self-improve its spatial understanding and drawing quality without parameter updates. Experiments show that 3DrawAgent can generate complex and coherent 3D Bezier sketches from diverse textual prompts, exhibit emergent geometric reasoning, and generalize to novel shapes, establishing a new paradigm for advancing the field of training-free 3D sketch intelligence.
Abstract（参考訳）: 3D空間でのスケッチは、形状、構造、空間的関係に関する表現的推論を可能にするが、自然言語による3Dスケッチを生成することは大きな課題である。本研究では,大規模言語モデル(LLM)を利用して幾何学的フィードバックの下で3Dベジエ曲線を逐次描画する3DrawAgentを提案する。従来の2Dスケッチエージェントとは異なり,本手法では,最近提案されたグループリワードポリシー最適化(GRPO)パラダイムに適応した相対経験最適化戦略を導入する。 CLIPに基づく知覚的報酬とLLMに基づくきめ細かな定性評価に基づく比較的良い結果と悪い結果からなる、生成したスケッチ間のペアワイズ比較を構築する。これらの経験は、3D描画の以前の知識を反復的に洗練するために使用され、モデルの3D認識のブラックボックス強化を可能にする。この設計により,パラメータを更新することなく,空間的理解と描画品質を自己改善することができる。実験により、3DrawAgentは、多様なテキストプロンプトから複雑で一貫性のある3Dベジエスケッチを生成し、創発的な幾何学的推論を示し、新しい形状に一般化し、トレーニング不要な3Dスケッチインテリジェンス分野を前進させる新しいパラダイムを確立することができることが示された。

論文の概要: 3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience

関連論文リスト