Fugu-MT 論文翻訳(概要): HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing

論文の概要: HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing

arxiv url: http://arxiv.org/abs/2604.08884v1
Date: Fri, 10 Apr 2026 02:47:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-13 17:57:53.647645
Title: HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing
Title（参考訳）: HM-Bench:ハイパースペクトルリモートセンシングにおける多モード大言語モデルの総合ベンチマーク
Authors: Xinyu Zhang, Zurong Mai, Qingmei Li, Zjin Liao, Yibin Wen, Yuhang Chen, Xiaoya Fan, Chan Tsz Ho, Bi Tianyuan, Haoyuan Liang, Ruifeng Su, Zihao Qian, Juepeng Zheng, Jianxi Huang, Yutong Lu, Haohuan Fu,
Abstract要約: マルチモーダルな大言語モデル(MLLM)は、自然画像の理解において大きな進歩を遂げてきたが、ハイパースペクトル画像(HSI)よりも知覚し、推論する能力はいまだ研究されていない。 HSI理解においてMLLMを評価するために設計された最初のベンチマークであるHyperspectral Multimodal Benchmark (HM-Bench)を紹介する。基本認識からスペクトル推論まで,13のタスクカテゴリにまたがる19,337の質問応答対の大規模データセットをキュレートする。
参考スコア（独自算出の注目度）: 22.804236694410367
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While multimodal large language models (MLLMs) have made significant strides in natural image understanding, their ability to perceive and reason over hyperspectral image (HSI) remains underexplored, which is a vital modality in remote sensing. The high dimensionality and intricate spectral-spatial properties of HSI pose unique challenges for models primarily trained on RGB data.To address this gap, we introduce Hyperspectral Multimodal Benchmark (HM-Bench), the first benchmark designed specifically to evaluate MLLMs in HSI understanding. We curate a large-scale dataset of 19,337 question-answer pairs across 13 task categories, ranging from basic perception to spectral reasoning. Given that existing MLLMs are not equipped to process raw hyperspectral cubes natively, we propose a dual-modality evaluation framework that transforms HSI data into two complementary representations: PCA-based composite images and structured textual reports. This approach facilitates a systematic comparison of different representation for model performance. Extensive evaluations on 18 representative MLLMs reveal significant difficulties in handling complex spatial-spectral reasoning tasks. Furthermore, our results demonstrate that visual inputs generally outperform textual inputs, highlighting the importance of grounding in spectral-spatial evidence for effective HSI understanding. Dataset and appendix can be accessed at https://github.com/HuoRiLi-Yu/HM-Bench.
Abstract（参考訳）: マルチモーダル大言語モデル(MLLM)は、自然画像理解において大きな進歩を遂げているが、ハイパースペクトル画像(HSI)に対する知覚と推論能力は、リモートセンシングにおいて重要なモダリティである。 HSIの高次元および複雑なスペクトル空間特性は、主にRGBデータに基づいて訓練されたモデルに固有の課題をもたらし、このギャップを解決するために、HSI理解においてMLLMを評価するために設計された最初のベンチマークであるHyperspectral Multimodal Benchmark (HM-Bench)を導入する。基本認識からスペクトル推論まで,13のタスクカテゴリにまたがる19,337の質問応答対の大規模データセットをキュレートする。既存のMLLMが生の超スペクトル立方体をネイティブに処理することができないことを考慮し、HSIデータをPCAベースの合成画像と構造化テキストレポートの2つの相補的な表現に変換する2重モード評価フレームワークを提案する。このアプローチは、モデル性能の異なる表現の体系的な比較を容易にする。 18種類のMLLMの大規模評価は、複雑な空間スペクトル推論タスクの処理に重大な困難を呈する。さらに, 視覚入力はテキスト入力よりも優れており, HSIの効果的な理解のためのスペクトル空間的証拠の基盤化の重要性を強調した。 Datasetとappixはhttps://github.com/HuoRiLi-Yu/HM-Benchでアクセスできる。

論文の概要: HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing

関連論文リスト