Related papers: SHD360: A Benchmark Dataset for Salient Human Detection in 360{\deg} Videos

SHD360: A Benchmark Dataset for Salient Human Detection in 360{\deg} Videos

URL: http://arxiv.org/abs/2105.11578v1
Date: Mon, 24 May 2021 23:51:29 GMT
Title: SHD360: A Benchmark Dataset for Salient Human Detection in 360{\deg} Videos
Authors: Yi Zhang, Lu Zhang, Jing Zhang, Kang Wang, Wassim Hamidouche, Olivier Deforges
Abstract summary: We propose SHD360, the first 360deg video SHD dataset collecting various real-life daily scenes. SHD360 contains 16,238 salient human instances with manually annotated pixel-wise ground truth. Our proposed dataset and benchmark could serve as a good starting point for advancing human-centric researches towards 360deg panoramic data.
Score: 26.263614207849276
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Salient human detection (SHD) in dynamic 360{\deg} immersive videos is of great importance for various applications such as robotics, inter-human and human-object interaction in augmented reality. However, 360{\deg} video SHD has been seldom discussed in the computer vision community due to a lack of datasets with large-scale omnidirectional videos and rich annotations. To this end, we propose SHD360, the first 360{\deg} video SHD dataset collecting various real-life daily scenes, providing six-level hierarchical annotations for 6,268 key frames uniformly sampled from 37,403 omnidirectional video frames at 4K resolution. Specifically, each collected key frame is labeled with a super-class, a sub-class, associated attributes (e.g., geometrical distortion), bounding boxes and per-pixel object-/instance-level masks. As a result, our SHD360 contains totally 16,238 salient human instances with manually annotated pixel-wise ground truth. Since so far there is no method proposed for 360{\deg} SHD, we systematically benchmark 11 representative state-of-the-art salient object detection (SOD) approaches on our SHD360, and explore key issues derived from extensive experimenting results. We hope our proposed dataset and benchmark could serve as a good starting point for advancing human-centric researches towards 360{\deg} panoramic data. Our dataset and benchmark will be publicly available at https://github.com/PanoAsh/SHD360.

Related papers

Omnidirectional Video Super-Resolution using Deep Learning [3.281128493853064]
The limited spatial resolution in 360deg videos does not allow for each degree of view to be represented with adequate pixels.<n>This paper proposes a novel deep learning model for 360deg Video Super-Resolution (360deg VSR) called Spherical Signal Super-resolution with a Proportioned optimisation (S3PO)<n>S3PO adopts recurrent modelling with an attention mechanism, unbound from conventional VSR techniques like alignment.
arXiv Detail & Related papers (2025-06-03T05:59:21Z)
From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos [71.22810401256234]
Three-dimensional (3D) understanding of objects and scenes play a key role in humans' ability to interact with the world. Large scale synthetic and object-centric 3D datasets have shown to be effective in training models that have 3D understanding of objects. We introduce 360-1M, a 360 video dataset, and a process for efficiently finding corresponding frames from diverse viewpoints at scale.
arXiv Detail & Related papers (2024-12-10T18:59:44Z)
Imagine360: Immersive 360 Video Generation from Perspective Anchor [79.97844408255897]
Imagine360 is a perspective-to-$360circ$ video generation framework. It learns fine-grained spherical visual and motion patterns from limited $360circ$ video data. It achieves superior graphics quality and motion coherence among state-of-the-art $360circ$ video generation methods.
arXiv Detail & Related papers (2024-12-04T18:50:08Z)
MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views [90.26609689682876]
We introduce MVSplat360, a feed-forward approach for 360deg novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations. This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided. Our model is end-to-end trainable and supports rendering arbitrary views with as few as 5 sparse input views.
arXiv Detail & Related papers (2024-11-07T17:59:31Z)
Any360D: Towards 360 Depth Anything with Unlabeled 360 Data and Möbius Spatial Augmentation [19.202253857381688]
We propose a semi-supervised learning framework to learn a 360 depth foundation model, dubbed Any360D. Under the umbrella of SSL, Any360D first learns a teacher model by fine-tuning DAM via metric depth supervision. Extensive experiments demonstrate that Any360D outperforms DAM and many prior data-specific models, showing impressive zero-shot capacity for being a 360 depth foundation model.
arXiv Detail & Related papers (2024-06-19T09:19:06Z)
NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes [59.15910989235392]
We introduce NeO 360, Neural fields for sparse view synthesis of outdoor scenes. NeO 360 is a generalizable method that reconstructs 360deg scenes from a single or a few posed RGB images. Our representation combines the best of both voxel-based and bird's-eye-view (BEV) representations.
arXiv Detail & Related papers (2023-08-24T17:59:50Z)
PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking [90.29143475328506]
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. We animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos.
arXiv Detail & Related papers (2023-07-27T17:58:11Z)
360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking [10.87309734945868]
360deg images can provide an omnidirectional field of view which is important for stable and long-term scene perception. In this paper, we explore 360deg images for visual object tracking and perceive new challenges caused by large distortion. We propose a new large-scale omnidirectional tracking benchmark dataset, 360VOT, in order to facilitate future research.
arXiv Detail & Related papers (2023-07-27T05:32:01Z)
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars [157.82758221794452]
We present RenderMe-360, a comprehensive 4D human head dataset to drive advance in head avatar research. It contains massive data assets, with 243+ million complete head frames, and over 800k video sequences from 500 different identities. Based on the dataset, we build a comprehensive benchmark for head avatar research, with 16 state-of-the-art methods performed on five main tasks.
arXiv Detail & Related papers (2023-05-22T17:54:01Z)
ASOD60K: Audio-Induced Salient Object Detection in Panoramic Videos [79.05486554647918]
We propose PV-SOD, a new task that aims to segment salient objects from panoramic videos. In contrast to existing fixation-level or object-level saliency detection tasks, we focus on multi-modal salient object detection (SOD) We collect the first large-scale dataset, named ASOD60K, which contains 4K-resolution video frames annotated with a six-level hierarchy.
arXiv Detail & Related papers (2021-07-24T15:14:20Z)
ATSal: An Attention Based Architecture for Saliency Prediction in 360 Videos [5.831115928056554]
This paper proposes ATSal, a novel attention based (head-eye) saliency model for 360degree videos. We compare the proposed approach to other state-of-the-art saliency models on two datasets: Salient360! and VR-EyeTracking. Experimental results on over 80 ODV videos (75K+ frames) show that the proposed method outperforms the existing state-of-the-art.
arXiv Detail & Related papers (2020-11-20T19:19:48Z)
A Fixation-based 360{\deg} Benchmark Dataset for Salient Object Detection [21.314578493964333]
Fixation prediction (FP) in panoramic contents has been widely investigated along with the booming trend of virtual reality (VR) applications. salient object detection (SOD) has been seldom explored in 360deg images due to the lack of datasets representative of real scenes.
arXiv Detail & Related papers (2020-01-22T11:16:39Z)
Visual Question Answering on 360{\deg} Images [96.00046925811515]
VQA 360 is a novel task of visual question answering on 360 images. We collect the first VQA 360 dataset, containing around 17,000 real-world image-question-answer triplets for a variety of question types.
arXiv Detail & Related papers (2020-01-10T08:18:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.