PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
- URL: http://arxiv.org/abs/2309.12303v5
- Date: Sun, 28 Jul 2024 09:04:04 GMT
- Title: PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
- Authors: Shilin Yan, Xiaohao Xu, Renrui Zhang, Lingyi Hong, Wenchao Chen, Wenqiang Zhang, Wei Zhang,
- Abstract summary: We present a panoramic video dataset, PanoVOS.
The dataset provides 150 videos with high video resolutions and diverse motions.
We present a Panoramic Space Consistency Transformer (PSCFormer) which can effectively utilize the semantic boundary information of the previous frame for pixel-level matching with the current frame.
- Score: 39.269864548255576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Panoramic videos contain richer spatial information and have attracted tremendous amounts of attention due to their exceptional experience in some fields such as autonomous driving and virtual reality. However, existing datasets for video segmentation only focus on conventional planar images. To address the challenge, in this paper, we present a panoramic video dataset, PanoVOS. The dataset provides 150 videos with high video resolutions and diverse motions. To quantify the domain gap between 2D planar videos and panoramic videos, we evaluate 15 off-the-shelf video object segmentation (VOS) models on PanoVOS. Through error analysis, we found that all of them fail to tackle pixel-level content discontinues of panoramic videos. Thus, we present a Panoramic Space Consistency Transformer (PSCFormer), which can effectively utilize the semantic boundary information of the previous frame for pixel-level matching with the current frame. Extensive experiments demonstrate that compared with the previous SOTA models, our PSCFormer network exhibits a great advantage in terms of segmentation results under the panoramic setting. Our dataset poses new challenges in panoramic VOS and we hope that our PanoVOS can advance the development of panoramic segmentation/tracking.
Related papers
- DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion [60.45000652592418]
We propose a novel text-driven panoramic generation framework, DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation.
We show that DiffPano can generate consistent, diverse panoramic images with given unseen text descriptions and camera poses.
arXiv Detail & Related papers (2024-10-31T17:57:02Z) - VidPanos: Generative Panoramic Videos from Casual Panning Videos [73.77443496436749]
Panoramic image stitching provides a unified, wide-angle view of a scene that extends beyond the camera's field of view.
We present a method for synthesizing a panoramic video from a casually-captured panning video.
Our system can create video panoramas for a range of in-the-wild scenes including people, vehicles, and flowing water.
arXiv Detail & Related papers (2024-10-17T17:53:24Z) - PanoViT: Vision Transformer for Room Layout Estimation from a Single
Panoramic Image [11.053777620735175]
PanoViT is a panorama vision transformer to estimate the room layout from a single panoramic image.
Compared to CNN models, our PanoViT is more proficient in learning global information from the panoramic image.
Our method outperforms state-of-the-art solutions in room layout prediction accuracy.
arXiv Detail & Related papers (2022-12-23T05:37:11Z) - Waymo Open Dataset: Panoramic Video Panoptic Segmentation [48.04664130918314]
Research in image segmentation has become increasingly popular due to its critical applications in robotics and autonomous driving.
Due to the high costs of densely labeling the images, there is a shortage of publicly available ground truth labels.
We present a large-scale dataset that offers high-quality panoptic segmentation labels for autonomous driving.
arXiv Detail & Related papers (2022-06-15T17:57:28Z) - Panoramic Panoptic Segmentation: Towards Complete Surrounding
Understanding via Unsupervised Contrastive Learning [97.37544023666833]
We introduce panoramic panoptic segmentation as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to the agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2021-03-01T09:37:27Z) - Video Panoptic Segmentation [117.08520543864054]
We propose and explore a new video extension of this task, called video panoptic segmentation.
To invigorate research on this new task, we present two types of video panoptic datasets.
We propose a novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames.
arXiv Detail & Related papers (2020-06-19T19:35:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.