Fugu-MT 論文翻訳(概要): NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

論文の概要: NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

arxiv url: http://arxiv.org/abs/2605.20525v1
Date: Tue, 19 May 2026 21:54:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.392024
Title: NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding
Title（参考訳）: NeuroQA: 3D脳MRI理解のための大規模画像収集ベンチマーク
Authors: Mohammad H. Abbasi, Favour Nerrise, Shaurnav Ghosh, Ridvan Yesiloglu, Yuncong Mao, Bailey Trang, Mohammad Asadi, Merryn Daniel, Gustavo Chau Loo Kung, Ken Chang, Pavan Pinkesh Shah, Adam Turnbull, Kyan Younes, Seena Dehkharghani, Ehsan Adeli,
Abstract要約: NeuroQAは3次元脳磁気共鳴画像(MRI)における視覚的質問応答のための大規模ベンチマークである 5-104歳と5つの臨床領域(アルツハイマー病、パーキンソン病、腫瘍、白質疾患、神経発達)にまたがる。 NeuroQAは、Yes/No、Multi-choice、オープンエンドフォーマットの11の臨床基礎的推論スキルを評価している。
参考スコア（独自算出の注目度）: 6.217658756255346
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present NeuroQA, a large-scale benchmark for visual question answering in 3D brain magnetic resonance imaging (MRI), with 56,953 QA pairs from 12,977 subjects across 12 datasets. It spans ages 5-104 and five clinical domains: Alzheimer's, Parkinson's, tumors, white matter disease, and neurodevelopment. Unlike prior medical Visual Question Answering (VQA) efforts that operate on 2D slices or rely on narrow diagnostic labels, NeuroQA pairs every item with a full 3D volume. It evaluates 11 clinically grounded reasoning skills across Yes/No, multiple-choice, and open-ended formats. Of the 203 templates, 131 are image-grounded (answerable from a 3-plane viewer) and 72 are image-informed (ground truth from quantitative volumetry or clinical instruments). To remove text-only shortcuts, we apply answer-distribution refinement, reducing closed-format text-only accuracy from $>$80% to 44.6%; image necessity is assessed separately through an image-grounding protocol released with the benchmark. A 38-rule deterministic pipeline and two rounds of expert review verify every QA pair against FreeSurfer measurements, metadata, or radiology report fields, with zero same-subject contradictions across templates. We conduct a clinician evaluation in which two clinicians independently assess 100 frozen test items on a three-plane viewer. On closed-format (Yes/No + multiple-choice) test-public items, the best zero-shot vision-language model and a supervised 3D CNN baseline reach 47.5% and 43.7% accuracy respectively, both below the 49.4% text-only majority-template floor. NeuroQA adopts a two-tier release with public QA pairs for open-access datasets and reproducible generation scripts for datasets restricted by data use agreements (DUAs), plus subject-level splits, a held-out private test set, and an online leaderboard.
Abstract（参考訳）: 我々は,12データセットにわたる12,977名の被験者から56,953名のQAペアを用いて,3次元脳磁気共鳴画像(MRI)における視覚的質問応答の大規模ベンチマークであるNeuroQAを提案する。 5-104歳と5つの臨床領域(アルツハイマー病、パーキンソン病、腫瘍、白質疾患、神経発達)にまたがる。 2Dスライスや狭い診断ラベルを頼りにしている従来の医療用視覚質問応答(VQA)とは異なり、NeuroQAは全3Dボリュームのアイテムをペアリングする。また、Yes/No、Multi-choice、オープンエンドフォーマットの11の臨床的根拠による推論スキルを評価している。 203のテンプレートのうち131は画像グラウンド(3平面ビューアから検索可能)、72は画像インフォーム(定量的ボリュームや臨床機器からの地上真実)である。テキストのみのショートカットを除去するために、回答分散の改良を適用し、クローズドフォーマットのテキストのみの精度を80～44.6%に下げる。 38ルールの決定論的パイプラインと2ラウンドの専門家レビューは、テンプレート間で同じオブジェクトの矛盾がゼロで、FreeSurfer測定、メタデータ、または放射線学レポートフィールドに対して、QAペア毎に検証する。 2人の臨床医が独立して3面ビューア上で100個の凍結試験項目を評価できる臨床評価を行う。クローズドフォーマット(Yes/No + multiple-choice)テスト公開アイテムでは、最高のゼロショットビジョン言語モデルと教師付き3D CNNベースラインはそれぞれ47.5%と43.7%に達し、どちらも49.4%のテキストのみのマジョリティタイムフロア以下である。 NeuroQAは、オープンアクセスデータセット用のパブリックQAペアと、データ使用契約(DUA)に制限されたデータセットの再現可能な生成スクリプト、サブジェクトレベルの分割、プライベートテストセット、オンラインリーダボードの2層リリースを採用する。

論文の概要: NeuroQA: A Large-Scale Image-Grounded Benchmark for 3D Brain MRI Understanding

関連論文リスト