Saliency-aware Stereoscopic Video Retargeting
- URL: http://arxiv.org/abs/2304.08852v1
- Date: Tue, 18 Apr 2023 09:38:33 GMT
- Title: Saliency-aware Stereoscopic Video Retargeting
- Authors: Hassan Imani, Md Baharul Islam, Lai-Kuan Wong
- Abstract summary: This paper proposes an unsupervised deep learning-based stereo video network.
Our model first detects the salient objects and shifts and warps all objects that it minimizes the distortion of the salient parts of the stereo frames.
To train the network, we use the attention mechanism to fuse the left and right views and feed the retargeted frames to a reconstruction module that reverses the retargeted frames to the parallax input frames.
- Score: 4.332879001008757
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Stereo video retargeting aims to resize an image to a desired aspect ratio.
The quality of retargeted videos can be significantly impacted by the stereo
videos spatial, temporal, and disparity coherence, all of which can be impacted
by the retargeting process. Due to the lack of a publicly accessible annotated
dataset, there is little research on deep learning-based methods for stereo
video retargeting. This paper proposes an unsupervised deep learning-based
stereo video retargeting network. Our model first detects the salient objects
and shifts and warps all objects such that it minimizes the distortion of the
salient parts of the stereo frames. We use 1D convolution for shifting the
salient objects and design a stereo video Transformer to assist the retargeting
process. To train the network, we use the parallax attention mechanism to fuse
the left and right views and feed the retargeted frames to a reconstruction
module that reverses the retargeted frames to the input frames. Therefore, the
network is trained in an unsupervised manner. Extensive qualitative and
quantitative experiments and ablation studies on KITTI stereo 2012 and 2015
datasets demonstrate the efficiency of the proposed method over the existing
state-of-the-art methods. The code is available at
https://github.com/z65451/SVR/.
Related papers
- SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - Rethinking Image-to-Video Adaptation: An Object-centric Perspective [61.833533295978484]
We propose a novel and efficient image-to-video adaptation strategy from the object-centric perspective.
Inspired by human perception, we integrate a proxy task of object discovery into image-to-video transfer learning.
arXiv Detail & Related papers (2024-07-09T13:58:10Z) - MV2MAE: Multi-View Video Masked Autoencoders [33.61642891911761]
We present a method for self-supervised learning from synchronized multi-view videos.
We use a cross-view reconstruction task to inject geometry information in the model.
Our approach is based on the masked autoencoder (MAE) framework.
arXiv Detail & Related papers (2024-01-29T05:58:23Z) - Towards Robust Video Object Segmentation with Adaptive Object
Calibration [18.094698623128146]
Video object segmentation (VOS) aims at segmenting objects in all target frames of a video, given annotated object masks of reference frames.
We propose a new deep network, which can adaptively construct object representations and calibrate object masks to achieve stronger robustness.
Our model achieves the state-of-the-art performance among existing published works, and also exhibits superior robustness against perturbations.
arXiv Detail & Related papers (2022-07-02T17:51:29Z) - Stereoscopic Universal Perturbations across Different Architectures and
Datasets [60.021985610201156]
We study the effect of adversarial perturbations of images on deep stereo matching networks for the disparity estimation task.
We present a method to craft a single set of perturbations that, when added to any stereo image pair in a dataset, can fool a stereo network.
Our perturbations can increase D1-error (akin to fooling rate) of state-of-the-art stereo networks from 1% to as much as 87%.
arXiv Detail & Related papers (2021-12-12T02:11:31Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Cloud based Scalable Object Recognition from Video Streams using
Orientation Fusion and Convolutional Neural Networks [11.44782606621054]
Convolutional neural networks (CNNs) have been widely used to perform intelligent visual object recognition.
CNNs still suffer from severe accuracy degradation, particularly on illumination-variant datasets.
We propose a new CNN method based on orientation fusion for visual object recognition.
arXiv Detail & Related papers (2021-06-19T07:15:15Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z) - TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting [107.39743751292028]
TransMoMo is capable of transferring motion of a person in a source video realistically to another video of a target person.
We exploit invariance properties of three factors of variation including motion, structure, and view-angle.
We demonstrate the effectiveness of our method over the state-of-the-art methods.
arXiv Detail & Related papers (2020-03-31T17:49:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.