A Gated Fusion Network for Dynamic Saliency Prediction
- URL: http://arxiv.org/abs/2102.07682v1
- Date: Mon, 15 Feb 2021 17:18:37 GMT
- Title: A Gated Fusion Network for Dynamic Saliency Prediction
- Authors: Aysun Kocak, Erkut Erdem and Aykut Erdem
- Abstract summary: Gated Fusion Network for dynamic saliency (GFSalNet)
GFSalNet is first deep saliency model capable of making predictions in a dynamic way via gated fusion mechanism.
We show that it has a good generalization ability, and moreover, exploits temporal information more effectively via its adaptive fusion scheme.
- Score: 16.701214795454536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting saliency in videos is a challenging problem due to complex
modeling of interactions between spatial and temporal information, especially
when ever-changing, dynamic nature of videos is considered. Recently,
researchers have proposed large-scale datasets and models that take advantage
of deep learning as a way to understand what's important for video saliency.
These approaches, however, learn to combine spatial and temporal features in a
static manner and do not adapt themselves much to the changes in the video
content. In this paper, we introduce Gated Fusion Network for dynamic saliency
(GFSalNet), the first deep saliency model capable of making predictions in a
dynamic way via gated fusion mechanism. Moreover, our model also exploits
spatial and channel-wise attention within a multi-scale architecture that
further allows for highly accurate predictions. We evaluate the proposed
approach on a number of datasets, and our experimental analysis demonstrates
that it outperforms or is highly competitive with the state of the art.
Importantly, we show that it has a good generalization ability, and moreover,
exploits temporal information more effectively via its adaptive fusion scheme.
Related papers
- Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Leaping Into Memories: Space-Time Deep Feature Synthesis [93.10032043225362]
We propose LEAPS, an architecture-independent method for synthesizing videos from internal models.
We quantitatively and qualitatively evaluate the applicability of LEAPS by inverting a range of architectures convolutional attention-based on Kinetics-400.
arXiv Detail & Related papers (2023-03-17T12:55:22Z) - Learning Interacting Dynamical Systems with Latent Gaussian Process ODEs [13.436770170612295]
We study for the first time uncertainty-aware modeling of continuous-time dynamics of interacting objects.
Our model infers both independent dynamics and their interactions with reliable uncertainty estimates.
arXiv Detail & Related papers (2022-05-24T08:36:25Z) - Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models.
We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z) - Goal-driven Self-Attentive Recurrent Networks for Trajectory Prediction [31.02081143697431]
Human trajectory forecasting is a key component of autonomous vehicles, social-aware robots and video-surveillance applications.
We propose a lightweight attention-based recurrent backbone that acts solely on past observed positions.
We employ a common goal module, based on a U-Net architecture, which additionally extracts semantic information to predict scene-compliant destinations.
arXiv Detail & Related papers (2022-04-25T11:12:37Z) - Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations.
To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video.
In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z) - TCL: Transformer-based Dynamic Graph Modelling via Contrastive Learning [87.38675639186405]
We propose a novel graph neural network approach, called TCL, which deals with the dynamically-evolving graph in a continuous-time fashion.
To the best of our knowledge, this is the first attempt to apply contrastive learning to representation learning on dynamic graphs.
arXiv Detail & Related papers (2021-05-17T15:33:25Z) - Multi-agent Trajectory Prediction with Fuzzy Query Attention [15.12743751614964]
Trajectory prediction for scenes with multiple agents is a challenging problem in numerous domains such as traffic prediction, pedestrian tracking and path planning.
We present a general architecture to address this challenge which models the crucial inductive biases of motion, namely, inertia, relative motion, intents and interactions.
arXiv Detail & Related papers (2020-10-29T19:12:12Z) - Deep learning of contagion dynamics on complex networks [0.0]
We propose a complementary approach based on deep learning to build effective models of contagion dynamics on networks.
By allowing simulations on arbitrary network structures, our approach makes it possible to explore the properties of the learned dynamics beyond the training data.
Our results demonstrate how deep learning offers a new and complementary perspective to build effective models of contagion dynamics on networks.
arXiv Detail & Related papers (2020-06-09T17:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.