Fugu-MT 論文翻訳(概要): Report of the 5th PVUW Challenge: Towards More Diverse Modalities in Pixel-Level Understanding

論文の概要: Report of the 5th PVUW Challenge: Towards More Diverse Modalities in Pixel-Level Understanding

arxiv url: http://arxiv.org/abs/2604.26031v1
Date: Tue, 28 Apr 2026 18:14:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-30 15:59:36.139033
Title: Report of the 5th PVUW Challenge: Towards More Diverse Modalities in Pixel-Level Understanding
Title（参考訳）: 第5回PVUWチャレンジ報告
Authors: Chang Liu, Henghui Ding, Nikhila Ravi, Yunchao Wei, Shuting He, Song Bai, Philip Torr, Leilei Cao, Jinrong Zhang, Deshui Miao, Xusheng He, Dengxian Gong, Zhiyu Wang, Mingqi Gao, Jihwan Hong, Canyang Wu, Weili Guan, Jianlong Wu, Liqiang Nie, Xingsen Huang, Yameng Gu, Xiaogang Yu, Xin Li, Ming-Hsuan Yang, Sijie Li, Jungong Han, Quanzhu Niu, Shihao Chen, Yuanzheng Wu, Yikang Zhou, Tao Zhang, Haobo Yuan, Lu Qi, Shunping Ji, Chao Yang, Chao Tian, Guoqing Zhu, Kai Yang, Zhifan Mo, Haijun Zhang, Xudong Kang, Shutao Li, Jaeyoung Do,
Abstract要約: 本報告では、2026年のPVUWチャレンジの目的、データセット、および最高パフォーマンスの方法論を要約する。 2026年版では、密集した乱雑なシナリオの中でオブジェクトを追跡するMOSEトラック、動きに焦点を絞った言語表現でターゲットをローカライズするMeViS-Textトラック、音響駆動型オブジェクトセグメンテーションの先駆者であるMeViS-Audioトラックの3つの特別なトラックがある。
参考スコア（独自算出の注目度）: 202.7892709083317
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This report summarizes the objectives, datasets, and top-performing methodologies of the 2026 Pixel-level Video Understanding in the Wild (PVUW) Challenge, hosted at CVPR 2026, which evaluates state-of-the-art models under highly unconstrained conditions. To provide a comprehensive assessment, the 2026 edition features three specialized tracks: the MOSE track for tracking objects within densely cluttered and severely occluded scenarios; the MeViS-Text track for localizing targets via motion-focused linguistic expressions; and the newly inaugurated MeViS-Audio track, which pioneers acoustic-driven object segmentation. By introducing previously unreleased challenging data and analyzing the cutting-edge, multimodal solutions submitted by participants, this report highlights the community's latest technical advancements and charts promising future directions for robust video scene comprehension.
Abstract（参考訳）: 本報告では,CVPR 2026で開催されている2026 Pixel-level Video Understanding in the Wild (PVUW) Challengeの目的,データセット,およびトップパフォーマンスの方法論について要約する。包括的評価のため、2026年版では、密集した乱雑なシナリオ内でオブジェクトを追跡するMOSEトラック、動きに焦点を絞った言語表現を介してターゲットをローカライズするMeViS-Textトラック、音響駆動型オブジェクトセグメンテーションを開拓するMeViS-Audioトラックの3つの特別なトラックが提供されている。未発表の課題データを導入し、参加者が提出した最先端のマルチモーダルソリューションを分析することで、コミュニティの最新技術進歩と、堅牢なビデオシーン理解のための今後の方向性を約束するチャートを明らかにする。

論文の概要: Report of the 5th PVUW Challenge: Towards More Diverse Modalities in Pixel-Level Understanding

関連論文リスト