Fugu-MT 論文翻訳(概要): DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing

論文の概要: DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing

arxiv url: http://arxiv.org/abs/2603.28713v1
Date: Mon, 30 Mar 2026 17:30:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:45.539513
Title: DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing
Title（参考訳）: DreamLite:画像生成と編集のための軽量オンデバイス統一モデル
Authors: Kailai Feng, Yuxiang Wei, Bo Chen, Yang Pan, Hu Ye, Songwei Liu, Chenqian Yan, Yuan Gao,
Abstract要約: 本稿では、T2I生成とテキスト誘導画像編集の両方を単一のネットワークでサポートする、コンパクトなオンデバイス拡散モデル(0.39B)を提案する。 DreamLiteは刈り取られたモバイルU-Netのバックボーン上に構築され、イメージを入力として統一し、生成タスクに(ターゲット | 空白)設定、編集タスクに(ターゲット | ソース)設定を使用する。高品質のSFTと強化学習の後、DreamLiteは画像生成のためのGenEval (0.72)、画像編集のためのImgEdit (4.11)を達成し、既存のオンデバイスモデルより優れている。
参考スコア（独自算出の注目度）: 12.515161196847442
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have made significant progress in both text-to-image (T2I) generation and text-guided image editing. However, these models are typically built with billions of parameters, leading to high latency and increased deployment challenges. While on-device diffusion models improve efficiency, they largely focus on T2I generation and lack support for image editing. In this paper, we propose DreamLite, a compact unified on-device diffusion model (0.39B) that supports both T2I generation and text-guided image editing within a single network. DreamLite is built on a pruned mobile U-Net backbone and unifies conditioning through in-context spatial concatenation in the latent space. It concatenates images horizontally as input, using a (target | blank) configuration for generation tasks and (target | source) for editing tasks. To stabilize the training of this compact model, we introduce a task-progressive joint pretraining strategy that sequentially targets T2I, editing, and joint tasks. After high-quality SFT and reinforcement learning, DreamLite achieves GenEval (0.72) for image generation and ImgEdit (4.11) for image editing, outperforming existing on-device models and remaining competitive with several server-side models. By employing step distillation, we further reduce denoising processing to just 4 steps, enabling our DreamLite could generate or edit a 1024 x 1024 image in less than 1s on a Xiaomi 14 smartphone. To the best of our knowledge, DreamLite is the first unified on-device diffusion model that supports both image generation and image editing.
Abstract（参考訳）: 拡散モデルはテキスト・ツー・イメージ(T2I)生成とテキスト誘導画像編集の両方において大きな進歩を遂げている。しかし、これらのモデルは典型的には数十億のパラメータで構築され、高いレイテンシとデプロイメント上の課題に繋がる。デバイス上の拡散モデルでは効率が向上するが、T2I生成に重点を置いており、画像編集をサポートしていない。本稿では,T2I生成とテキスト誘導画像編集の両方を単一のネットワークでサポートする,コンパクトなオンデバイス拡散モデル(0.39B)であるDreamLiteを提案する。 DreamLiteは刈り取ったモバイルU-Netのバックボーン上に構築され、潜在空間におけるコンテキスト内空間の連結を通じて条件を統一する。画像は入力として水平に結合し、生成タスクに(ターゲット | 空白)設定、編集タスクに(ターゲット | ソース)設定を使用する。このコンパクトモデルのトレーニングを安定させるために,T2I,編集,共同作業を逐次的に対象とするタスク・プログレッシブ・ジョイント事前学習戦略を導入する。高品質のSFTと強化学習の後、DreamLiteは画像生成のためのGenEval (0.72)と画像編集のためのImgEdit (4.11)を達成し、既存のデバイスモデルよりも優れ、サーバサイドモデルとの競争力を維持している。そこでDreamLiteは1024×1024の画像をXiaomi 14のスマートフォンで1秒未満で生成、編集できる。私たちの知る限りでは、DreamLiteは画像生成と画像編集の両方をサポートする最初のデバイス上での拡散モデルである。

論文の概要: DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing

関連論文リスト