Fugu-MT 論文翻訳(概要): Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video

論文の概要: Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video

arxiv url: http://arxiv.org/abs/2507.00339v1
Date: Tue, 01 Jul 2025 00:36:56 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-03 14:22:59.140092
Title: Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video
Title（参考訳）: X線ビジョンのためのトレーニング:マルチカメラ映像からのアモーダルセグメンテーション、アモーダルコンテントコンプリート、ビュー不変物体表現
Authors: Alexander Moore, Amar Saini, Kylie Cancilla, Doug Poland, Carmen Carrano,
Abstract要約: 我々はMOVi-MC-AC:Multiple Object Video with Multi-Cameras and Amodal Contentを紹介する。このデータセットは、これまでで最大のアモーダルセグメンテーションであり、最初のアモーダルコンテンツデータセットである。コンピュータビジョンの世界におけるディープラーニングへの新たなコントリビューションが2つ含まれています。
参考スコア（独自算出の注目度）: 37.755852787082254
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Amodal segmentation and amodal content completion require using object priors to estimate occluded masks and features of objects in complex scenes. Until now, no data has provided an additional dimension for object context: the possibility of multiple cameras sharing a view of a scene. We introduce MOVi-MC-AC: Multiple Object Video with Multi-Cameras and Amodal Content, the largest amodal segmentation and first amodal content dataset to date. Cluttered scenes of generic household objects are simulated in multi-camera video. MOVi-MC-AC contributes to the growing literature of object detection, tracking, and segmentation by including two new contributions to the deep learning for computer vision world. Multiple Camera (MC) settings where objects can be identified and tracked between various unique camera perspectives are rare in both synthetic and real-world video. We introduce a new complexity to synthetic video by providing consistent object ids for detections and segmentations between both frames and multiple cameras each with unique features and motion patterns on a single scene. Amodal Content (AC) is a reconstructive task in which models predict the appearance of target objects through occlusions. In the amodal segmentation literature, some datasets have been released with amodal detection, tracking, and segmentation labels. While other methods rely on slow cut-and-paste schemes to generate amodal content pseudo-labels, they do not account for natural occlusions present in the modal masks. MOVi-MC-AC provides labels for ~5.8 million object instances, setting a new maximum in the amodal dataset literature, along with being the first to provide ground-truth amodal content. The full dataset is available at https://huggingface.co/datasets/Amar-S/MOVi-MC-AC ,
Abstract（参考訳）: アモーダルセグメンテーションとアモーダルコンテントコンプリートは、複雑なシーンにおける隠蔽マスクとオブジェクトの特徴を推定するために、オブジェクト事前を使用する必要がある。これまでは、複数のカメラがシーンのビューを共有する可能性という、オブジェクトコンテキストのための追加の次元を提供していなかった。我々は,MOVi-MC-AC:Multiple Object Video with Multi-Cameras and Amodal Contentを紹介した。汎用的な家庭用オブジェクトのシャッターシーンは、マルチカメラビデオでシミュレートされる。 MOVi-MC-ACは、コンピュータビジョンの世界におけるディープラーニングへの2つの新しい貢献を含めることで、オブジェクトの検出、追跡、セグメンテーションの文献の増大に貢献している。複数のカメラ(MC)の設定では、さまざまなカメラの視点でオブジェクトを識別し、追跡することは、合成ビデオと実世界のビデオの両方で稀である。合成ビデオに新しい複雑さを導入し、フレームと複数のカメラ間の一貫したオブジェクトIDを1つのシーンでそれぞれ一貫した特徴と動きパターンで提供する。 Amodal Content (AC) は、モデルがオクルージョンを通して対象物の出現を予測する再構成作業である。アモーダルセグメンテーションの文献では、いくつかのデータセットがアモーダル検出、追跡、セグメンテーションラベルと共にリリースされている。他の方法は、非モーダルな内容の擬似ラベルを生成するための遅いカット・アンド・ペースト方式に依存しているが、それらはモーダルマスクに存在する自然な閉塞を考慮に入れていない。 MOVi-MC-ACは、約580万のオブジェクトインスタンスのラベルを提供し、アモーダルデータセットの文献に新しい最大値を設定し、また、アモーダルコンテンツを提供する最初の企業である。完全なデータセットはhttps://huggingface.co/datasets/Amar-S/MOVi-MC-AC で公開されている。

論文の概要: Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video

関連論文リスト