Fugu-MT 論文翻訳(概要): SHARP: Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion in Remote Sensing Synthesis

論文の概要: SHARP: Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion in Remote Sensing Synthesis

arxiv url: http://arxiv.org/abs/2603.21783v1
Date: Mon, 23 Mar 2026 10:25:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.606249
Title: SHARP: Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion in Remote Sensing Synthesis
Title（参考訳）: SHARP:リモートセンシング合成における分解能向上のためのスペクトル対応高ダイナミック適応
Authors: Bingxuan Zhao, Qing Zhou, Chuang Yang, Qi Wang,
Abstract要約: リモートセンシング画像は、車両、建物輪郭、道路標識などの空撮リアリズムに不可欠な微細構造を符号化する。 Rotary Position Embedding (RoPE) 再スケーリングによるトレーニング不要の解決促進は、実用的な対策を提供するが、既存の方法はすべて、デノナイジングプロセス全体を通して静的な位置スケーリングルールを適用している。有理分数時間スケジュールk_rs(t)をRoPEに導入する訓練自由手法であるSHARP(Spectrum-aware Highly-dynamic Adaptation for Resolution promoted)を提案する。
参考スコア（独自算出の注目度）: 14.489371802189426
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image generation powered by Diffusion Transformers (DiTs) has made remarkable strides, yet remote sensing (RS) synthesis lags behind due to two barriers: the absence of a domain-specialized DiT prior and the prohibitive cost of training at the large resolutions that RS applications demand. Training-free resolution promotion via Rotary Position Embedding (RoPE) rescaling offers a practical remedy, but every existing method applies a static positional scaling rule throughout the denoising process. This uniform compression is particularly harmful for RS imagery, whose substantially denser medium- and high-frequency energy encodes the fine structures critical for aerial-scene realism, such as vehicles, building contours, and road markings. Addressing both challenges requires a domain-specialized generative prior coupled with a denoising-aware positional adaptation strategy. To this end, we fine-tune FLUX on over 100,000 curated RS images to build a strong domain prior (RS-FLUX), and propose Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion (SHARP), a training-free method that introduces a rational fractional time schedule k_rs(t) into RoPE. SHARP applies strong positional promotion during the early layout-formation stage and progressively relaxes it during detail recovery, aligning extrapolation strength with the frequency-progressive nature of diffusion denoising. Its resolution-agnostic formulation further enables robust multi-scale generation from a single set of hyperparameters. Extensive experiments across six square and rectangular resolutions show that SHARP consistently outperforms all training-free baselines on CLIP Score, Aesthetic Score, and HPSv2, with widening margins at more aggressive extrapolation factors and negligible computational overhead. Code and weights are available at https://github.com/bxuanz/SHARP.
Abstract（参考訳）: Diffusion Transformers (DiTs) を利用したテキスト・ツー・イメージ生成は、ドメイン特化されたDiTが存在しないことと、RSアプリケーションが要求する大規模な解像度でのトレーニングの禁止コストという2つの障壁により、顕著な進歩を遂げた。 Rotary Position Embedding (RoPE) 再スケーリングによるトレーニング不要の解決促進は、実用的な対策を提供するが、既存の方法はすべて、デノナイジングプロセス全体を通して静的な位置スケーリングルールを適用している。この一様圧縮はRS画像にとって特に有害であり、中・高周波のエネルギーは、車両、建物輪郭、道路標識などの空中現実主義に不可欠な微細構造をコード化している。両方の課題に対処するためには、ドメイン特化生成の事前と、デノナイジング対応の位置適応戦略が必要である。この目的のために,10万以上のキュレートされたRS画像に対してFLUXを微調整し,強いドメイン事前(RS-FLUX)を構築するとともに,RPEに有意な分数スケジュールk_rs(t)を導入する訓練不要な手法であるスペクトル対応高ダイナミック適応分解促進法(SHARP)を提案する。 SHARPは、初期の配置形成段階で強い位置促進を施し、細部回復の過程で徐々に緩める。その分解能に依存しない定式化により、単一のハイパーパラメータ集合から堅牢なマルチスケール生成が可能になる。 SHARPはCLIP Score、Aesthetic Score、HPSv2のトレーニングなしベースラインを一貫して上回り、より積極的な外挿係数と無視できない計算オーバーヘッドでマージンを広げている。コードとウェイトはhttps://github.com/bxuanz/SHARP.comで入手できる。

論文の概要: SHARP: Spectrum-aware Highly-dynamic Adaptation for Resolution Promotion in Remote Sensing Synthesis

関連論文リスト