Fugu-MT 論文翻訳(概要): Scalable Video Object Segmentation with Identification Mechanism

論文の概要: Scalable Video Object Segmentation with Identification Mechanism

arxiv url: http://arxiv.org/abs/2203.11442v6
Date: Mon, 3 Jul 2023 04:58:30 GMT
ステータス: 翻訳完了
システム内更新日: 2023-07-04 16:17:28.652892
Title: Scalable Video Object Segmentation with Identification Mechanism
Title（参考訳）: 識別機構を有するスケーラブルビデオオブジェクト分割
Authors: Zongxin Yang, Xiaohan Wang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Yi Yang
Abstract要約: 本稿では,半教師付きビデオオブジェクト(VOS)のスケーラブルで効果的なマルチオブジェクトモデリングを実現する上での課題について検討する。以前のVOSメソッドは、単一の正のオブジェクトで機能をデコードし、マルチオブジェクト表現の学習を制限する。 AOT(Associating Objects with Transformers)とAOST(Associating Objects with Scalable Transformers)の2つの革新的なアプローチを提案する。
参考スコア（独自算出の注目度）: 102.52315557080561
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS). Previous VOS methods decode features with a single positive object, limiting the learning of multi-object representation as they must match and segment each target separately under multi-object scenarios. Additionally, earlier techniques catered to specific application objectives and lacked the flexibility to fulfill different speed-accuracy requirements. To address these problems, we present two innovative approaches, Associating Objects with Transformers (AOT) and Associating Objects with Scalable Transformers (AOST). In pursuing effective multi-object modeling, AOT introduces the IDentification (ID) mechanism to allocate each object a unique identity. This approach enables the network to model the associations among all objects simultaneously, thus facilitating the tracking and segmentation of objects in a single network pass. To address the challenge of inflexible deployment, AOST further integrates scalable long short-term transformers that incorporate layer-wise ID-based attention and scalable supervision. This overcomes ID embeddings' representation limitations and enables online architecture scalability in VOS for the first time. Given the absence of a benchmark for VOS involving densely multi-object annotations, we propose a challenging Video Object Segmentation in the Wild (VOSW) benchmark to validate our approaches. We evaluated various AOT and AOST variants using extensive experiments across VOSW and five commonly-used VOS benchmarks. Our approaches surpass the state-of-the-art competitors and display exceptional efficiency and scalability consistently across all six benchmarks. Moreover, we notably achieved the 1st position in the 3rd Large-scale Video Object Segmentation Challenge.
Abstract（参考訳）: 本稿では、半教師付きビデオオブジェクトセグメンテーション(VOS)のためのスケーラブルで効果的なマルチオブジェクトモデリングを実現するための課題について述べる。従来のvosメソッドは単一の正のオブジェクトで特徴をデコードし、複数のオブジェクトの表現の学習を制限する。さらに、以前のテクニックは特定のアプリケーション目標に適合し、異なるスピード精度要件を満たす柔軟性に欠けていた。これらの問題を解決するために,AOT(Associating Objects with Transformers)とAOST(Associating Objects with Scalable Transformers)という2つの革新的なアプローチを提案する。効果的なマルチオブジェクトモデリングの追求において、AOTは各オブジェクトにユニークなIDを割り当てるためのID(ID)メカニズムを導入する。このアプローチにより、ネットワークはすべてのオブジェクト間の関連性を同時にモデル化し、単一のネットワークパスにおけるオブジェクトの追跡とセグメンテーションを容易にする。非フレキシブルなデプロイメントの課題に対処するため、AOSTはさらに、レイヤワイドIDベースの注意とスケーラブルな監視を含む、スケーラブルな長期的な短期トランスフォーマーを統合する。これはID埋め込みの表現制限を克服し、VOSにおけるオンラインアーキテクチャのスケーラビリティを初めて実現します。マルチオブジェクトアノテーションを含むVOSのベンチマークが欠如していることを踏まえ,我々のアプローチを検証するために,ビデオオブジェクトセグメンテーション・イン・ザ・ワイルド(VOSW)ベンチマークを提案する。 VOSWおよび5種類のVOSベンチマークを用いて,様々なAOTおよびAOST変異体の評価を行った。当社のアプローチは最先端のコンペティタを上回っており、6つのベンチマークで一貫して優れた効率性とスケーラビリティを示しています。また,第3回大規模映像オブジェクトセグメンテーションチャレンジにおいて,第1位となった。

論文の概要: Scalable Video Object Segmentation with Identification Mechanism

関連論文リスト