Fugu-MT 論文翻訳(概要): Controllable Texture Tiling with Transformed RoPE-Enhanced Diffusion Models

論文の概要: Controllable Texture Tiling with Transformed RoPE-Enhanced Diffusion Models

arxiv url: http://arxiv.org/abs/2606.22945v1
Date: Mon, 22 Jun 2026 07:24:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-25 03:23:50.112791
Title: Controllable Texture Tiling with Transformed RoPE-Enhanced Diffusion Models
Title（参考訳）: 変換 RoPE 拡張拡散モデルによる制御可能なテクスチャタイリング
Authors: Junrong Huang, Zhiyuan Zhang, Rui Tang, Hongbo Fu, Jnig Liao,
Abstract要約: 拡散変換器に基づく制御可能で高忠実なテクスチャタイリングのための新しいフレームワークを提案する。本手法は,コンテンツ生成から空間操作を分離する技術革新を2つ導入する。本手法は, 制御精度とテクスチャ忠実度の両方において, 最先端のベースラインよりも優れる。
参考スコア（独自算出の注目度）: 21.26349994452928
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Realistic integration of user-specified textures into scene images is a fundamental task in computer graphics and image editing. While existing material transfer and reference-guided inpainting methods can edit surface appearances, they often fail to address the specific requirements of texture tiling. This task necessitates precisely repeating a reference pattern according to user-defined parameters such as frequency, orientation, and scale. Furthermore, current generative approaches often struggle to maintain the structural fidelity of the reference texture, limited by either destructive pixel-level resampling or the lack of fine-grained spatial information in semantic image encoders, and they frequently fail to preserve the coherent lighting and geometry of the original scene. In this paper, we propose a novel framework for controllable and high-fidelity texture tiling based on Diffusion Transformers. Our approach introduces two key technical innovations to decouple spatial manipulation from content generation. First, we propose a Coordinate-Transformed Rotary Embedding mechanism. By applying 2D affine transformations directly to the relative positional embeddings between the target latent and the image condition, we achieve precise control over tiling patterns without explicit pixel warping, thereby utilizing the full information of the reference condition without degradation. Second, a Disjoint Attention Mask is employed to shield reference features from semantic leakage. This preserves structural integrity while seamlessly blending the synthesized texture with the scene's original lighting and geometry. Extensive experiments demonstrate that our method outperforms state-of-the-art baselines in both control accuracy and texture fidelity.
Abstract（参考訳）: ユーザ特定テクスチャのシーンイメージへのリアルな統合は、コンピュータグラフィックスと画像編集の基本的な課題である。既存の物質移動と基準誘導塗装法は表面の外観を編集できるが、テクスチャタイリングの特定の要求に対処できないことが多い。このタスクは、頻度、向き、スケールといったユーザ定義パラメータに従って、参照パターンを正確に繰り返す必要がある。さらに、現在の生成的アプローチは、しばしば、破壊的なピクセルレベルの再サンプリングまたはセマンティックイメージエンコーダにおけるきめ細かい空間情報の欠如によって制限される基準テクスチャの構造的忠実性を維持するのに苦労する。本稿では拡散変換器をベースとした制御可能で高忠実なテクスチャタイリングのための新しいフレームワークを提案する。本手法は,コンテンツ生成から空間操作を分離する技術革新を2つ導入する。まず,コーディネート変換ロータリー埋め込み機構を提案する。 2Dアフィン変換を目標潜像と画像条件との間の相対的な位置埋め込みに直接適用することにより、明示的な画素ワープを伴わずにタイリングパターンの正確な制御を実現し、参照条件の全情報を劣化せずに活用する。第2に、セマンティックリークから参照特徴を保護するために、Disjoint Attention Maskが使用される。これは、合成されたテクスチャとシーンの元々の照明と幾何学をシームレスにブレンドしながら、構造的な整合性を保っている。本手法は, 制御精度とテクスチャ忠実度の両方において, 最先端のベースラインよりも優れていることを示す。

論文の概要: Controllable Texture Tiling with Transformed RoPE-Enhanced Diffusion Models

関連論文リスト