Fugu-MT 論文翻訳(概要): RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting

論文の概要: RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting

arxiv url: http://arxiv.org/abs/2603.14941v1
Date: Mon, 16 Mar 2026 07:45:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-17 16:19:36.141157
Title: RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting
Title（参考訳）: RS-WorldModel: リモートセンシング理解と未来のセンス予測のための統一モデル
Authors: Linrui Xu, Zhongan Wang, Fei Shen, Gang Xu, Huiping Zhuang, Ming Li, Haifeng Li,
Abstract要約: 統一ワールドモデルであるRS-WorldModelは、すべての理解とテキスト誘導のシーン予測を処理する。 R-1.1Mは、両方のタスクをカバーするリッチ言語を備えた1100万のサンプルデータセットです。 RS-WorldModelは2Bパラメータしか持たないため、ほとんどの時間的変化に対する質問に対して最大120ドル以上の費用がかかる。
参考スコア（独自算出の注目度）: 20.55654078017388
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Remote sensing world models aim to both explain observed changes and forecast plausible futures, two tasks that share spatiotemporal priors. Existing methods, however, typically address them separately, limiting cross-task transfer. We present RS-WorldModel, a unified world model for remote sensing that jointly handles spatiotemporal change understanding and text-guided future scene forecasting, and we build RSWBench-1.1M, a 1.1 million sample dataset with rich language annotations covering both tasks. RS-WorldModel is trained in three stages: (1) Geo-Aware Generative Pre-training (GAGP) conditions forecasting on geographic and acquisition metadata; (2) synergistic instruction tuning (SIT) jointly trains understanding and forecasting; (3) verifiable reinforcement optimization (VRO) refines outputs with verifiable, task-specific rewards. With only 2B parameters, RS-WorldModel surpasses open-source models up to 120$ \times $ larger on most spatiotemporal change question-answering metrics. It achieves an FID of 43.13 on text-guided future scene forecasting, outperforming all open-source baselines as well as the closed-source Gemini-2.5-Flash Image (Nano Banana).
Abstract（参考訳）: リモートセンシングの世界モデルは、観測された変化と、時空間前兆を共有する2つのタスクである予測可能な未来の両方を説明することを目的としている。しかし、既存の方法は通常、それらを個別に扱い、クロスタスク転送を制限する。我々は,時空間変化の理解とテキスト誘導による将来のシーン予測を共同で扱う,リモートセンシングのための統一世界モデルであるRS-WorldModelを提案し,両タスクをカバーするリッチ言語アノテーションを備えた1100万のサンプルデータセットであるRSWBench-1.1Mを構築した。 RS-WorldModel は,(1) 地理的・取得メタデータに基づくジオ・アウェア・ジェネレーティブ・プレトレーニング(GAGP)条件,(2) 相乗的指導チューニング(SIT) 協調列車の理解と予測,(3) 検証可能な強化最適化(VRO) の3段階で訓練される。 2Bパラメータだけで、RS-WorldModelは、ほとんどの時空間変化質問回答メトリクスで最大120$ \times$以上のオープンソースモデルを超えている。テキスト誘導による将来の予測では43.13のFIDを達成し、すべてのオープンソースベースラインとクローズドソースのGemini-2.5-Flash Image(Nano Banana)を上回ります。

論文の概要: RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting

関連論文リスト