Fugu-MT 論文翻訳(概要): Scaling Laws of Synthetic Images for Model Training ... for Now

論文の概要: Scaling Laws of Synthetic Images for Model Training ... for Now

arxiv url: http://arxiv.org/abs/2312.04567v1
Date: Thu, 7 Dec 2023 18:59:59 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-08 13:24:45.229966
Title: Scaling Laws of Synthetic Images for Model Training ... for Now
Title（参考訳）: モデルトレーニングのための合成画像のスケーリング則...
Authors: Lijie Fan, Kaifeng Chen, Dilip Krishnan, Dina Katabi, Phillip Isola, Yonglong Tian
Abstract要約: 本研究では, 合成画像のスケーリング法則について, テクスト・ツー・イメージ・モデルの現状から検討した。合成画像は、CLIPトレーニングの実際の画像と似ているが、やや効果の低いスケーリング傾向を示す。
参考スコア（独自算出の注目度）: 54.43596959598466
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent significant advances in text-to-image models unlock the possibility of training vision systems using synthetic images, potentially overcoming the difficulty of collecting curated data at scale. It is unclear, however, how these models behave at scale, as more synthetic data is added to the training set. In this paper we study the scaling laws of synthetic images generated by state of the art text-to-image models, for the training of supervised models: image classifiers with label supervision, and CLIP with language supervision. We identify several factors, including text prompts, classifier-free guidance scale, and types of text-to-image models, that significantly affect scaling behavior. After tuning these factors, we observe that synthetic images demonstrate a scaling trend similar to, but slightly less effective than, real images in CLIP training, while they significantly underperform in scaling when training supervised image classifiers. Our analysis indicates that the main reason for this underperformance is the inability of off-the-shelf text-to-image models to generate certain concepts, a limitation that significantly impairs the training of image classifiers. Our findings also suggest that scaling synthetic data can be particularly effective in scenarios such as: (1) when there is a limited supply of real images for a supervised problem (e.g., fewer than 0.5 million images in ImageNet), (2) when the evaluation dataset diverges significantly from the training data, indicating the out-of-distribution scenario, or (3) when synthetic data is used in conjunction with real images, as demonstrated in the training of CLIP models.
Abstract（参考訳）: テキストから画像へのモデルの最近の重要な進歩は、合成画像を用いた視覚訓練システムの可能性を解き放ち、大規模なデータ収集の難しさを克服する可能性がある。しかし、トレーニングセットにより多くの合成データが付加されるため、これらのモデルが大規模にどのように振る舞うかは明らかではない。本稿では,テキスト・画像モデルの状態から生成した合成画像のスケーリング法を,ラベル管理付き画像分類器と言語指導付きCLIPを用いて,教師付きモデルのトレーニングのために検討する。我々は,テキストプロンプト,分類子なし指導尺度,テキスト・ツー・イメージ・モデルの種類など,スケーリング行動に大きな影響を及ぼす要因を同定する。これらの因子を調整した後、合成画像はCLIPトレーニングの実際の画像と似ているが、少し効果の低いスケーリング傾向を示すが、教師付き画像分類器のトレーニングではかなり性能が低い。このアンダーパフォーマンスの主な原因は,画像分類器の訓練を著しく損なうような,特定の概念を生成するために市販のテキスト・ツー・イメージモデルが使えないことである。また,(1)教師付き問題に対する実画像の供給が限られている場合(例:ImageNetで0.5万枚未満の画像),(2)評価データセットがトレーニングデータから大きく分岐する場合,(3)合成データが実画像と連動して使用される場合,などのシナリオにおいて,合成データのスケーリングが特に有効であることが示唆された。

論文の概要: Scaling Laws of Synthetic Images for Model Training ... for Now

関連論文リスト