WinDB: HMD-free and Distortion-free Panoptic Video Fixation Learning
- URL: http://arxiv.org/abs/2305.13901v3
- Date: Wed, 27 Sep 2023 12:35:42 GMT
- Title: WinDB: HMD-free and Distortion-free Panoptic Video Fixation Learning
- Authors: Guotao Wang, Chenglizhao Chen, Aimin Hao, Hong Qin, Deng-Ping Fan
- Abstract summary: This paper introduces a dynamic blurring (WinDB) fixation collection approach for panoptic video.
We have released a new PanopticVideo-300 dataset, containing 300 panoptic clips covering over 225 categories.
Using our WinDB approach, there exists frequent and intensive "fixation shifting" - a very special phenomenon that has long been overlooked.
- Score: 70.15653649348674
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To date, the widely adopted way to perform fixation collection in panoptic
video is based on a head-mounted display (HMD), where users' fixations are
collected while wearing an HMD to explore the given panoptic scene freely.
However, this widely-used data collection method is insufficient for training
deep models to accurately predict which regions in a given panoptic are most
important when it contains intermittent salient events. The main reason is that
there always exist "blind zooms" when using HMD to collect fixations since the
users cannot keep spinning their heads to explore the entire panoptic scene all
the time. Consequently, the collected fixations tend to be trapped in some
local views, leaving the remaining areas to be the "blind zooms". Therefore,
fixation data collected using HMD-based methods that accumulate local views
cannot accurately represent the overall global importance - the main purpose of
fixations - of complex panoptic scenes. To conquer, this paper introduces the
auxiliary window with a dynamic blurring (WinDB) fixation collection approach
for panoptic video, which doesn't need HMD and is able to well reflect the
regional-wise importance degree. Using our WinDB approach, we have released a
new PanopticVideo-300 dataset, containing 300 panoptic clips covering over 225
categories. Specifically, since using WinDB to collect fixations is blind zoom
free, there exists frequent and intensive "fixation shifting" - a very special
phenomenon that has long been overlooked by the previous research - in our new
set. Thus, we present an effective fixation shifting network (FishNet) to
conquer it. All these new fixation collection tool, dataset, and network could
be very potential to open a new age for fixation-related research and
applications in 360o environments.
Related papers
- Object-level Scene Deocclusion [92.39886029550286]
We present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, for object-level scene deocclusion.
To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning.
Experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin.
arXiv Detail & Related papers (2024-06-11T20:34:10Z) - Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors [51.36238367193988]
We tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM)
We present SparseSplat360, a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views.
Our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail.
arXiv Detail & Related papers (2024-05-26T11:01:39Z) - Panonut360: A Head and Eye Tracking Dataset for Panoramic Video [0.0]
We present a head and eye tracking dataset involving 50 users watching 15 panoramic videos.
The dataset provides details on the viewport and gaze attention locations of users.
Our analysis reveals a consistent downward offset in gaze fixations relative to the Field of View.
arXiv Detail & Related papers (2024-03-26T13:54:52Z) - Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video
Grounding [59.599378814835205]
Temporal Video Grounding (TVG) aims to localize the temporal boundary of a specific segment in an untrimmed video based on a given language query.
We introduce a novel AMDA method to adaptively adjust the model's scene-related knowledge by incorporating insights from the target data.
arXiv Detail & Related papers (2023-12-21T07:49:27Z) - Panoptic Video Scene Graph Generation [110.82362282102288]
We propose and study a new problem called panoptic scene graph generation (PVSG)
PVSG relates to the existing video scene graph generation problem, which focuses on temporal interactions between humans and objects grounded with bounding boxes in videos.
We contribute the PVSG dataset, which consists of 400 videos (289 third-person + 111 egocentric videos) with a total of 150K frames labeled with panoptic segmentation masks as well as fine, temporal scene graphs.
arXiv Detail & Related papers (2023-11-28T18:59:57Z) - Glitch in the Matrix: A Large Scale Benchmark for Content Driven
Audio-Visual Forgery Detection and Localization [20.46053083071752]
We propose and benchmark a new dataset, Localized Visual DeepFake (LAV-DF)
LAV-DF consists of strategic content-driven audio, visual and audio-visual manipulations.
The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture.
arXiv Detail & Related papers (2023-05-03T08:48:45Z) - NEWTON: Neural View-Centric Mapping for On-the-Fly Large-Scale SLAM [51.21564182169607]
Newton is a view-centric mapping method that dynamically constructs neural fields based on run-time observation.
Our method enables camera pose updates using loop closures and scene boundary updates by representing the scene with multiple neural fields.
The experimental results demonstrate the superior performance of our method over existing world-centric neural field-based SLAM systems.
arXiv Detail & Related papers (2023-03-23T20:22:01Z) - MonoDVPS: A Self-Supervised Monocular Depth Estimation Approach to
Depth-aware Video Panoptic Segmentation [3.2489082010225494]
We propose a novel solution with a multi-task network that performs monocular depth estimation and video panoptic segmentation.
We introduce panoptic-guided depth losses and a novel panoptic masking scheme for moving objects to avoid corrupting the training signal.
arXiv Detail & Related papers (2022-10-14T07:00:42Z) - A Fixation-based 360{\deg} Benchmark Dataset for Salient Object
Detection [21.314578493964333]
Fixation prediction (FP) in panoramic contents has been widely investigated along with the booming trend of virtual reality (VR) applications.
salient object detection (SOD) has been seldom explored in 360deg images due to the lack of datasets representative of real scenes.
arXiv Detail & Related papers (2020-01-22T11:16:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.