ViDaS Video Depth-aware Saliency Network
- URL: http://arxiv.org/abs/2305.11729v1
- Date: Fri, 19 May 2023 15:04:49 GMT
- Title: ViDaS Video Depth-aware Saliency Network
- Authors: Ioanna Diamanti, Antigoni Tsiami, Petros Koutras and Petros Maragos
- Abstract summary: We introduce ViDaS, a two-stream, fully convolutional Video, Depth-Aware Saliency network.
It addresses the problem of attention modeling in-the-wild" via saliency prediction in videos.
Network consists of two visual streams, one for the RGB frames, and one for the depth frames.
It is trained end-to-end and is evaluated in a variety of different databases with eye-tracking data.
- Score: 40.08270905030302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce ViDaS, a two-stream, fully convolutional Video, Depth-Aware
Saliency network to address the problem of attention modeling ``in-the-wild",
via saliency prediction in videos. Contrary to existing visual saliency
approaches using only RGB frames as input, our network employs also depth as an
additional modality. The network consists of two visual streams, one for the
RGB frames, and one for the depth frames. Both streams follow an
encoder-decoder approach and are fused to obtain a final saliency map. The
network is trained end-to-end and is evaluated in a variety of different
databases with eye-tracking data, containing a wide range of video content.
Although the publicly available datasets do not contain depth, we estimate it
using three different state-of-the-art methods, to enable comparisons and a
deeper insight. Our method outperforms in most cases state-of-the-art models
and our RGB-only variant, which indicates that depth can be beneficial to
accurately estimating saliency in videos displayed on a 2D screen. Depth has
been widely used to assist salient object detection problems, where it has been
proven to be very beneficial. Our problem though differs significantly from
salient object detection, since it is not restricted to specific salient
objects, but predicts human attention in a more general aspect. These two
problems not only have different objectives, but also different ground truth
data and evaluation metrics. To our best knowledge, this is the first
competitive deep learning video saliency estimation approach that combines both
RGB and Depth features to address the general problem of saliency estimation
``in-the-wild". The code will be publicly released.
Related papers
- DEAR: Depth-Enhanced Action Recognition [9.933324297265495]
This research introduces a novel approach integrating 3D features and depth maps alongside RGB features to enhance action recognition accuracy.
Our method involves processing estimated depth maps through a separate branch from the RGB feature encoder and fusing the features to understand the scene and actions comprehensively.
arXiv Detail & Related papers (2024-08-28T10:08:38Z) - Learning Temporally Consistent Video Depth from Video Diffusion Priors [57.929828486615605]
This work addresses the challenge of video depth estimation.
We reformulate the prediction task into a conditional generation problem.
This allows us to leverage the prior knowledge embedded in existing video generation models.
arXiv Detail & Related papers (2024-06-03T16:20:24Z) - NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation [58.21817572577012]
Video depth estimation aims to infer temporally consistent depth.
We introduce NVDS+ that stabilizes inconsistent depth estimated by various single-image models in a plug-and-play manner.
We also elaborate a large-scale Video Depth in the Wild dataset, which contains 14,203 videos with over two million frames.
arXiv Detail & Related papers (2023-07-17T17:57:01Z) - RGB-D Salient Object Detection with Ubiquitous Target Awareness [37.6726410843724]
We make the first attempt to solve the RGB-D salient object detection problem with a novel depth-awareness framework.
We propose a Ubiquitous Target Awareness (UTA) network to solve three important challenges in RGB-D SOD task.
Our proposed UTA network is depth-free for inference and runs in real-time with 43 FPS.
arXiv Detail & Related papers (2021-09-08T04:27:29Z) - DynOcc: Learning Single-View Depth from Dynamic Occlusion Cues [37.837552043766166]
We introduce the first depth dataset DynOcc consisting of dynamic in-the-wild scenes.
Our approach leverages the cues in these dynamic scenes to infer depth relationships between points of selected video frames.
In total our DynOcc dataset contains 22M depth pairs out of 91K frames from a diverse set of videos.
arXiv Detail & Related papers (2021-03-30T22:17:36Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - Accurate RGB-D Salient Object Detection via Collaborative Learning [101.82654054191443]
RGB-D saliency detection shows impressive ability on some challenge scenarios.
We propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way.
arXiv Detail & Related papers (2020-07-23T04:33:36Z) - Is Depth Really Necessary for Salient Object Detection? [50.10888549190576]
We make the first attempt in realizing an unified depth-aware framework with only RGB information as input for inference.
Not only surpasses the state-of-the-art performances on five public RGB SOD benchmarks, but also surpasses the RGBD-based methods on five benchmarks by a large margin.
arXiv Detail & Related papers (2020-05-30T13:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.