Fugu-MT 論文翻訳(概要): PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset

論文の概要: PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset

arxiv url: http://arxiv.org/abs/2605.20147v1
Date: Tue, 19 May 2026 17:35:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.558722
Title: PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset
Title（参考訳）: PixVerve: 大規模高品質データセットでネイティブなUHRイメージ生成を100MPに向上
Authors: Haojun Chen, Haoyang He, Chengming Xu, Qingdong He, Junwei Zhu, Yabiao Wang, Zhucun Xue, Xianfang Zeng, Zhennan Chen, Xiaobin Hu, Hao Zhao, Yong Liu, Jiangning Zhang, Dacheng Tao,
Abstract要約: テキスト・ツー・イメージ(T2I)モデルは、最近1Kと2Kの解像度で顕著な進歩を遂げている。超高分解能(UHR)画像生成は、高分解能コンテンツの不足と複雑さのために大きな課題となる。 PixVerve-95Kは、慎重に設計されたデータパイプラインでキュレートされた高品質でオープンソースのUHR T2Iデータセットである。
参考スコア（独自算出の注目度）: 93.70328662327375
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-Image (T2I) models have recently seen notable progress around 1K and 2K resolution. With the extreme desire for better visual experience and the rapid development of imaging technology, the demand for Ultra-High-Resolution (UHR) image generation has grown significantly. However, UHR image generation poses great challenges due to the scarcity and complexity of high-resolution content. In this paper, we first introduce PixVerve-95K, a high-quality, open-source UHR T2I dataset curated with a carefully designed data pipeline, which contains 95K images across diverse scenarios (each image has a minimum pixel-count of 100M) and seven-dimensional annotations. Based on our large-scale image-text dataset, we take a pioneering step to extend various T2I foundation models to native 100MP generation with three training schemes. Finally, leveraging both conventional metrics and multimodal large language model-based assessments, our proposed PixVerve-Bench benchmark establishes a comprehensive evaluation protocol for UHR images encompassing visual quality and semantic alignment. Extensive experimental results on our benchmark and the constructive exploration of training strategies collaboratively provide valuable insights for future breakthroughs.
Abstract（参考訳）: テキスト・ツー・イメージ(T2I)モデルは、最近1Kと2Kの解像度で顕著な進歩を遂げた。視覚的体験の向上と画像技術の急速な発展を極端に望んで、UHR(Ultra-High-Resolution)画像生成の需要は大きく伸びている。しかし、高解像度コンテンツの不足と複雑さのため、UHR画像生成は大きな課題を生んでいる。本稿では, PixVerve-95Kについて紹介する。これは精巧に設計されたデータパイプラインでキュレートされた高品質でオープンソースなUHR T2Iデータセットで, 様々なシナリオ(各画像は最小画素数100M)と7次元アノテーションを含む。大規模な画像テキストデータセットに基づいて、様々なT2Iファンデーションモデルを3つのトレーニングスキームでネイティブ100MP世代に拡張する先駆的なステップを取ります。最後に,従来のメトリクスとマルチモーダルな大規模言語モデルに基づく評価を併用したPixVerve-Benchベンチマークにより,視覚的品質とセマンティックアライメントを含むUHR画像の包括的評価プロトコルを確立する。我々のベンチマークに関する大規模な実験結果と、トレーニング戦略の構築的探索は、今後のブレークスルーに価値ある洞察を共同で提供します。

論文の概要: PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset

関連論文リスト