SHD360: A Benchmark Dataset for Salient Human Detection in 360{\deg}
Videos
- URL: http://arxiv.org/abs/2105.11578v1
- Date: Mon, 24 May 2021 23:51:29 GMT
- Title: SHD360: A Benchmark Dataset for Salient Human Detection in 360{\deg}
Videos
- Authors: Yi Zhang, Lu Zhang, Jing Zhang, Kang Wang, Wassim Hamidouche, Olivier
Deforges
- Abstract summary: We propose SHD360, the first 360deg video SHD dataset collecting various real-life daily scenes.
SHD360 contains 16,238 salient human instances with manually annotated pixel-wise ground truth.
Our proposed dataset and benchmark could serve as a good starting point for advancing human-centric researches towards 360deg panoramic data.
- Score: 26.263614207849276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Salient human detection (SHD) in dynamic 360{\deg} immersive videos is of
great importance for various applications such as robotics, inter-human and
human-object interaction in augmented reality. However, 360{\deg} video SHD has
been seldom discussed in the computer vision community due to a lack of
datasets with large-scale omnidirectional videos and rich annotations. To this
end, we propose SHD360, the first 360{\deg} video SHD dataset collecting
various real-life daily scenes, providing six-level hierarchical annotations
for 6,268 key frames uniformly sampled from 37,403 omnidirectional video frames
at 4K resolution. Specifically, each collected key frame is labeled with a
super-class, a sub-class, associated attributes (e.g., geometrical distortion),
bounding boxes and per-pixel object-/instance-level masks. As a result, our
SHD360 contains totally 16,238 salient human instances with manually annotated
pixel-wise ground truth. Since so far there is no method proposed for 360{\deg}
SHD, we systematically benchmark 11 representative state-of-the-art salient
object detection (SOD) approaches on our SHD360, and explore key issues derived
from extensive experimenting results. We hope our proposed dataset and
benchmark could serve as a good starting point for advancing human-centric
researches towards 360{\deg} panoramic data. Our dataset and benchmark will be
publicly available at https://github.com/PanoAsh/SHD360.
Related papers
- MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views [90.26609689682876]
We introduce MVSplat360, a feed-forward approach for 360deg novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations.
This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided.
Our model is end-to-end trainable and supports rendering arbitrary views with as few as 5 sparse input views.
arXiv Detail & Related papers (2024-11-07T17:59:31Z) - Any360D: Towards 360 Depth Anything with Unlabeled 360 Data and Möbius Spatial Augmentation [19.202253857381688]
We propose a semi-supervised learning framework to learn a 360 depth foundation model, dubbed Any360D.
Under the umbrella of SSL, Any360D first learns a teacher model by fine-tuning DAM via metric depth supervision.
Extensive experiments demonstrate that Any360D outperforms DAM and many prior data-specific models, showing impressive zero-shot capacity for being a 360 depth foundation model.
arXiv Detail & Related papers (2024-06-19T09:19:06Z) - NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes [59.15910989235392]
We introduce NeO 360, Neural fields for sparse view synthesis of outdoor scenes.
NeO 360 is a generalizable method that reconstructs 360deg scenes from a single or a few posed RGB images.
Our representation combines the best of both voxel-based and bird's-eye-view (BEV) representations.
arXiv Detail & Related papers (2023-08-24T17:59:50Z) - PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point
Tracking [90.29143475328506]
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework.
Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion.
We animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos.
arXiv Detail & Related papers (2023-07-27T17:58:11Z) - 360VOT: A New Benchmark Dataset for Omnidirectional Visual Object
Tracking [10.87309734945868]
360deg images can provide an omnidirectional field of view which is important for stable and long-term scene perception.
In this paper, we explore 360deg images for visual object tracking and perceive new challenges caused by large distortion.
We propose a new large-scale omnidirectional tracking benchmark dataset, 360VOT, in order to facilitate future research.
arXiv Detail & Related papers (2023-07-27T05:32:01Z) - RenderMe-360: A Large Digital Asset Library and Benchmarks Towards
High-fidelity Head Avatars [157.82758221794452]
We present RenderMe-360, a comprehensive 4D human head dataset to drive advance in head avatar research.
It contains massive data assets, with 243+ million complete head frames, and over 800k video sequences from 500 different identities.
Based on the dataset, we build a comprehensive benchmark for head avatar research, with 16 state-of-the-art methods performed on five main tasks.
arXiv Detail & Related papers (2023-05-22T17:54:01Z) - ASOD60K: Audio-Induced Salient Object Detection in Panoramic Videos [79.05486554647918]
We propose PV-SOD, a new task that aims to segment salient objects from panoramic videos.
In contrast to existing fixation-level or object-level saliency detection tasks, we focus on multi-modal salient object detection (SOD)
We collect the first large-scale dataset, named ASOD60K, which contains 4K-resolution video frames annotated with a six-level hierarchy.
arXiv Detail & Related papers (2021-07-24T15:14:20Z) - ATSal: An Attention Based Architecture for Saliency Prediction in 360
Videos [5.831115928056554]
This paper proposes ATSal, a novel attention based (head-eye) saliency model for 360degree videos.
We compare the proposed approach to other state-of-the-art saliency models on two datasets: Salient360! and VR-EyeTracking.
Experimental results on over 80 ODV videos (75K+ frames) show that the proposed method outperforms the existing state-of-the-art.
arXiv Detail & Related papers (2020-11-20T19:19:48Z) - A Fixation-based 360{\deg} Benchmark Dataset for Salient Object
Detection [21.314578493964333]
Fixation prediction (FP) in panoramic contents has been widely investigated along with the booming trend of virtual reality (VR) applications.
salient object detection (SOD) has been seldom explored in 360deg images due to the lack of datasets representative of real scenes.
arXiv Detail & Related papers (2020-01-22T11:16:39Z) - Visual Question Answering on 360{\deg} Images [96.00046925811515]
VQA 360 is a novel task of visual question answering on 360 images.
We collect the first VQA 360 dataset, containing around 17,000 real-world image-question-answer triplets for a variety of question types.
arXiv Detail & Related papers (2020-01-10T08:18:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.