Fugu-MT 論文翻訳(概要): Hyperparameters are all you need: Using five-step inference for an original diffusion model to generate images comparable to the latest distillation model

論文の概要: Hyperparameters are all you need: Using five-step inference for an original diffusion model to generate images comparable to the latest distillation model

arxiv url: http://arxiv.org/abs/2510.02390v1
Date: Tue, 30 Sep 2025 23:27:09 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-06 16:35:52.071796
Title: Hyperparameters are all you need: Using five-step inference for an original diffusion model to generate images comparable to the latest distillation model
Title（参考訳）: ハイパーパラメータは必要なものすべて: 5段階の推論を元の拡散モデルに使用して、最新の蒸留モデルに匹敵する画像を生成する。
Authors: Zilai Li,
Abstract要約: 拡散モデルは、ニューラルネットワークを反復的に適用することによって画像を生成する最先端の生成モデルである。本研究では,拡散ODEとSDEのトラクション誤差の解析に基づいて,高品質な512 x 512と1024 x 1024の画像を8ステップで生成する学習自由度アルゴリズムを提案する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The diffusion model is a state-of-the-art generative model that generates an image by applying a neural network iteratively. Moreover, this generation process is regarded as an algorithm solving an ordinary differential equation or a stochastic differential equation. Based on the analysis of the truncation error of the diffusion ODE and SDE, our study proposes a training-free algorithm that generates high-quality 512 x 512 and 1024 x 1024 images in eight steps, with flexible guidance scales. To the best of my knowledge, our algorithm is the first one that samples a 1024 x 1024 resolution image in 8 steps with an FID performance comparable to that of the latest distillation model, but without additional training. Meanwhile, our algorithm can also generate a 512 x 512 image in 8 steps, and its FID performance is better than the inference result using state-of-the-art ODE solver DPM++ 2m in 20 steps. We validate our eight-step image generation algorithm using the COCO 2014, COCO 2017, and LAION datasets. And our best FID performance is 15.7, 22.35, and 17.52. While the FID performance of DPM++2m is 17.3, 23.75, and 17.33. Further, it also outperforms the state-of-the-art AMED-plugin solver, whose FID performance is 19.07, 25.50, and 18.06. We also apply the algorithm in five-step inference without additional training, for which the best FID performance in the datasets mentioned above is 19.18, 23.24, and 19.61, respectively, and is comparable to the performance of the state-of-the-art AMED Pulgin solver in eight steps, SDXL-turbo in four steps, and the state-of-the-art diffusion distillation model Flash Diffusion in five steps. We also validate our algorithm in synthesizing 1024 * 1024 images within 6 steps, whose FID performance only has a limited distance to the latest distillation algorithm. The code is in repo: https://github.com/TheLovesOfLadyPurple/Hyperparameters-are-all-you-need
Abstract（参考訳）: 拡散モデルは、ニューラルネットワークを反復的に適用することによって画像を生成する最先端の生成モデルである。さらに、この生成過程は、通常の微分方程式や確率微分方程式を解くアルゴリズムと見なされる。本研究では,拡散ODEとSDEのトラクション誤差の解析に基づいて,高画質の512 x 512と1024 x 1024の画像を8ステップで生成する学習自由度アルゴリズムを提案する。私の知る限りでは、我々のアルゴリズムは、最新の蒸留モデルに匹敵するFID性能を持つ8ステップで1024 x 1024の解像度画像をサンプリングする最初のものであるが、追加の訓練は受けていない。一方,本アルゴリズムでは,8ステップで512 x 512の画像を生成することができ,20ステップで最先端のODEソルバDPM++ 2mを用いた予測結果よりもFID性能が優れている。我々はCOCO 2014 COCO 2017とLAIONデータセットを用いて8段階の画像生成アルゴリズムを検証する。最高のFIDパフォーマンスは15.7、22.35、そして17.52です。 DPM++2mのFID性能は17.3、23.75、17.33である。さらに、FID性能が19.07、25.50、および18.06である最先端のAMED-pluginソルバよりも優れている。また、このアルゴリズムを追加トレーニングなしで5段階推論に適用し、上記のデータセットで最高のFID性能は19.18、23.24、19.61であり、最先端のAMED Pulginソルバの8ステップでの性能、SDXL-turboの4ステップでの性能、最先端の拡散蒸留モデルFlash Diffusionの5ステップでの性能に匹敵する。また,FID性能が最新の蒸留アルゴリズムと限られた距離しか持たない1024×1024画像を6ステップで合成する際のアルゴリズムの有効性を検証した。コードはリポジトリにある。 https://github.com/TheLovesOfLadyPurple/Hyperparameters-are-all-need

論文の概要: Hyperparameters are all you need: Using five-step inference for an original diffusion model to generate images comparable to the latest distillation model

関連論文リスト