Fugu-MT 論文翻訳(概要): GenClaw: Code-Driven Agentic Image Generation

論文の概要: GenClaw: Code-Driven Agentic Image Generation

arxiv url: http://arxiv.org/abs/2605.30248v1
Date: Thu, 28 May 2026 17:13:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:56.576772
Title: GenClaw: Code-Driven Agentic Image Generation
Title（参考訳）: GenClaw: コード駆動のエージェントイメージ生成
Authors: Junyan Ye, Jun He, Zilong Huang, Dongzhi Jiang, Xuan Yang, Rui Chen, Weijia Li,
Abstract要約: 我々は、エージェントが人間のアーティストのように作成できるようにするための、コード駆動画像生成パラダイムであるGenClawを提案する。具体的には、エージェントはまず、探索と推論を通じて概念的知識とコンテキストを構築する。次に、コード(SVG、HTML、Three.jsなど)を使って実行可能なビジュアルスケッチを描画する。
参考スコア（独自算出の注目度）: 40.94073553092702
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image generation models have evolved from text-conditioned pixel synthesis toward multimodal agents endowed with visual comprehension and tool invocation capabilities. Yet, existing agents remain at the mercy of underlying black-box image models. Their workflow is trapped in a repetitive cycle of prompt rewriting for generation refinement, leaving them with no mechanism to directly manipulate the canvas. In essence, the potential of LLMs to serve as a genuine "brush" for precise visual construction remains largely untapped. In this paper, we propose GenClaw, a code-driven agentic image generation paradigm that empowers the agent to create like a human artist: first conceptualizing, then sketching, and finally coloring. Specifically, the agent first constructs the conceptual knowledge and context through search and reasoning. It then utilizes code (e.g., SVG, HTML, Three.js) to render executable visual sketches. Finally, it employs an image generation model to supplement textures, materials, and photorealism. In this workflow, code serves as a controllable intermediate canvas bridging linguistic reasoning and pixel synthesis, seamlessly integrating programmatic logic with the visual expressiveness of generative models. By transforming image generation from a black-box paradigm into a staged process akin to authentic human creation, GenClaw offers a step toward for highly controllable and interpretable visual generation systems.
Abstract（参考訳）: 画像生成モデルは、テキスト条件のピクセル合成から、視覚的理解とツールの実行能力を備えたマルチモーダルエージェントへと進化してきた。しかし、既存のエージェントはブラックボックスの画像モデルに頼っている。彼らのワークフローは、世代改良のためのプロンプトリライトの繰り返しサイクルに閉じ込められており、キャンバスを直接操作するメカニズムが残っていない。本質的に、正確な視覚構成のための真の「ブラシ」として機能するLLMの可能性は、ほとんど未解決のままである。本稿では,コード駆動型エージェント画像生成パラダイムであるGenClawを提案する。具体的には、エージェントはまず、探索と推論を通じて概念的知識とコンテキストを構築する。次に、コード(SVG、HTML、Three.jsなど)を使って実行可能なビジュアルスケッチを描画する。最後に、テクスチャ、材料、フォトリアリズムを補うために、画像生成モデルを使用する。このワークフローでは、コードは制御可能な中間キャンバスとして機能し、言語推論とピクセル合成を行い、プログラム論理と生成モデルの視覚的表現性をシームレスに統合する。画像生成をブラックボックスのパラダイムから、真の人間の創造に似た段階的なプロセスに変換することで、GenClawは、高度に制御可能で解釈可能な視覚生成システムに向けたステップを提供する。

論文の概要: GenClaw: Code-Driven Agentic Image Generation

関連論文リスト