Fugu-MT 論文翻訳(概要): UW-VOS: A Large-Scale Dataset for Underwater Video Object Segmentation

論文の概要: UW-VOS: A Large-Scale Dataset for Underwater Video Object Segmentation

arxiv url: http://arxiv.org/abs/2603.24006v1
Date: Wed, 25 Mar 2026 07:10:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 21:06:11.177126
Title: UW-VOS: A Large-Scale Dataset for Underwater Video Object Segmentation
Title（参考訳）: UW-VOS:水中ビデオオブジェクトセグメンテーションのための大規模データセット
Authors: Hongshen Zhao, Jingkang Tai, Yuhang Wu, Wenkang Zhang, Xi Lan, Shangyan Wang, Tianyu Zhang, Wankou Yang,
Abstract要約: 本稿では,409のカテゴリに1,431の動画シーケンスと309,295のマスクアノテーションを含む,最初の大規模水中VOSベンチマークを紹介する。また、パラメータ効率のよいフレームワークである$textbfSAM-U$を提案し、SAM2を水中ドメインに適応させる。
参考スコア（独自算出の注目度）: 15.886577205112902
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Underwater Video Object Segmentation (VOS) is essential for marine exploration, yet open-air methods suffer significant degradation due to color distortion, low contrast, and prevalent camouflage. A primary hurdle is the lack of high-quality training data. To bridge this gap, we introduce $\textbf{UW-VOS}$, the first large-scale underwater VOS benchmark comprising 1,431 video sequences across 409 categories with 309,295 mask annotations, constructed via a semi-automatic data engine with rigorous human verification. We further propose $\textbf{SAM-U}$, a parameter-efficient framework that adapts SAM2 to the underwater domain. By inserting lightweight adapters into the image encoder, SAM-U achieves state-of-the-art performance with only $\sim$2$\%$ trainable parameters. Extensive experiments reveal that existing methods experience an average 13-point $\mathcal{J}\&\mathcal{F}$ drop on UW-VOS, while SAM-U effectively bridges this domain gap. Detailed attribute-based analysis further identifies small targets, camouflage, and exit-re-entry as critical bottlenecks, providing a roadmap for future research in robust underwater perception.
Abstract（参考訳）: 水中ビデオオブジェクトセグメンテーション(VOS)は海洋探査に欠かせないが、露天法は色歪み、低コントラスト、一般的なカモフラージュにより著しく劣化する。一番のハードルは、高品質なトレーニングデータがないことです。このギャップを埋めるために、$\textbf{UW-VOS}$は409のカテゴリに1,431の動画シーケンスと309,295のマスクアノテーションからなる最初の大規模水中VOSベンチマークである。さらに、SAM2を水中領域に適応させるパラメータ効率のよいフレームワークである$\textbf{SAM-U}$を提案する。 SAM-Uは、イメージエンコーダに軽量アダプタを挿入することで、$\sim$2$\%のトレーニング可能なパラメータで最先端のパフォーマンスを実現する。実験により、既存の手法ではUW-VOS上で平均13ポイントの$\mathcal{J}\&\mathcal{F}$ドロップを経験し、SAM-Uはこの領域ギャップを効果的に埋めることがわかった。詳細な属性に基づく分析により、小さな目標、カモフラージュ、出口再突入を重要なボトルネックとして特定し、将来の堅牢な水中知覚研究のロードマップを提供する。

論文の概要: UW-VOS: A Large-Scale Dataset for Underwater Video Object Segmentation

関連論文リスト