Fugu-MT 論文翻訳(概要): Efficient Image Synthesis with Sphere Latent Encoder

論文の概要: Efficient Image Synthesis with Sphere Latent Encoder

arxiv url: http://arxiv.org/abs/2605.15592v1
Date: Fri, 15 May 2026 04:03:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-18 21:22:26.165955
Title: Efficient Image Synthesis with Sphere Latent Encoder
Title（参考訳）: 球遅延エンコーダを用いた効率的な画像合成
Authors: Tung Do, Thuan Hoang Nguyen, Hao Li,
Abstract要約: 整合性や平均フローに基づく手法によってサンプリングステップの数を著しく削減するなど,画像生成は急速に進展している。 Sphereは,数ステップで高品質なイメージを生成する,最近の代替手段だ。我々は、このフレームワークを固定された事前訓練された画像エンコーダと、完全に球形潜在空間で訓練された独立した潜在聴覚モデルに分離する。 In Animal-Faces, Oxford-Flowers and ImageNet-1K datasets, our method are significantlyforms Sphere in generation quality and inference speed。
参考スコア（独自算出の注目度）: 9.381297061959112
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Few-step image generation has seen rapid progress, with consistency and meanflow-based methods significantly reducing the number of sampling steps. Despite their low inference cost, these approaches often suffer from training instability and limited scalability. Sphere Encoder is a recent alternative that produces high-quality images in only a few steps; however, it requires repeated transitions between the pixel space and latent space during inference while jointly optimizing reconstruction and generation within a single architecture. This design leads to computational inefficiency and objective conflict between reconstruction and generation. To address these limitations, we decouple the framework into a fixed pretrained image encoder and a separate latent denoising model trained entirely in a spherical latent space. Our approach eliminates repeated pixel-space operations during training and inference, improving efficiency and allowing reconstruction and generation to specialize independently. On Animal-Faces, Oxford-Flowers and ImageNet-1K datasets, our method significantly outperforms Sphere Encoder in both generation quality and inference speed, while achieving competitive results against strong few-step and multi-step baselines.
Abstract（参考訳）: 整合性や平均フローに基づく手法によってサンプリングステップの数を著しく削減するなど,画像生成は急速に進展している。推論コストが低いにもかかわらず、これらのアプローチはトレーニングの不安定さとスケーラビリティの制限に悩まされることが多い。球エンコーダ(Sphere Encoder)は、数ステップで高品質な画像を生成する最近の代替手段であるが、単一のアーキテクチャ内で再構成と生成を共同で最適化しながら、推論中に画素空間と潜時空間の間で繰り返し遷移する必要がある。この設計は、再構成と生成の間の計算の非効率性と客観的な衝突につながる。これらの制約に対処するため、我々はフレームワークを固定された事前訓練されたイメージエンコーダと、完全に球状ラテント空間で訓練された独立したラテント復調モデルに分離する。提案手法は, トレーニングや推論中に繰り返し発生するピクセル空間操作を排除し, 効率を向上し, 再構成と生成を独立して行うことができる。 In Animal-Faces, Oxford-Flowers and ImageNet-1K datas, our method are significantlyforms Sphere Encoder in generation quality and inference speed, while achieved competitive results against strong few-step and multi-step baselines。

論文の概要: Efficient Image Synthesis with Sphere Latent Encoder

関連論文リスト