Spatiotemporal Fusion in 3D CNNs: A Probabilistic View
- URL: http://arxiv.org/abs/2004.04981v1
- Date: Fri, 10 Apr 2020 10:40:35 GMT
- Title: Spatiotemporal Fusion in 3D CNNs: A Probabilistic View
- Authors: Yizhou Zhou, Xiaoyan Sun, Chong Luo, Zheng-Jun Zha and Wenjun Zeng
- Abstract summary: We propose to convert success thetemporal fusion strategies into a probability, which allows us to perform network-level evaluations of various fusion strategies without having to train them separately.
Our approach greatly boosts the efficiency of analyzingtemporal fusion.
We generate new fusion strategies which achieve the state-of-the-art performance on four well-grained action recognition datasets.
- Score: 129.84064609199663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the success in still image recognition, deep neural networks for
spatiotemporal signal tasks (such as human action recognition in videos) still
suffers from low efficacy and inefficiency over the past years. Recently, human
experts have put more efforts into analyzing the importance of different
components in 3D convolutional neural networks (3D CNNs) to design more
powerful spatiotemporal learning backbones. Among many others, spatiotemporal
fusion is one of the essentials. It controls how spatial and temporal signals
are extracted at each layer during inference. Previous attempts usually start
by ad-hoc designs that empirically combine certain convolutions and then draw
conclusions based on the performance obtained by training the corresponding
networks. These methods only support network-level analysis on limited number
of fusion strategies. In this paper, we propose to convert the spatiotemporal
fusion strategies into a probability space, which allows us to perform
network-level evaluations of various fusion strategies without having to train
them separately. Besides, we can also obtain fine-grained numerical information
such as layer-level preference on spatiotemporal fusion within the probability
space. Our approach greatly boosts the efficiency of analyzing spatiotemporal
fusion. Based on the probability space, we further generate new fusion
strategies which achieve the state-of-the-art performance on four well-known
action recognition datasets.
Related papers
- Active search and coverage using point-cloud reinforcement learning [50.741409008225766]
This paper presents an end-to-end deep reinforcement learning solution for target search and coverage.
We show that deep hierarchical feature learning works for RL and that by using farthest point sampling (FPS) we can reduce the amount of points.
We also show that multi-head attention for point-clouds helps to learn the agent faster but converges to the same outcome.
arXiv Detail & Related papers (2023-12-18T18:16:30Z) - Research on Data Fusion Algorithm Based on Deep Learning in Target
Tracking [10.335589214502987]
An eye tracking data fusion algorithm based on long and short-term memory network is proposed.
The experimental results show that compared with the two fusion algorithms based on deep learning, the algorithm proposed in this paper performs well in terms of fusion quality.
arXiv Detail & Related papers (2022-11-23T08:44:59Z) - ChiNet: Deep Recurrent Convolutional Learning for Multimodal Spacecraft
Pose Estimation [3.964047152162558]
This paper presents an innovative deep learning pipeline which estimates the relative pose of a spacecraft by incorporating the temporal information from a rendezvous sequence.
It leverages the performance of long short-term memory (LSTM) units in modelling sequences of data for the processing of features extracted by a convolutional neural network (CNN) backbone.
Three distinct training strategies, which follow a coarse-to-fine funnelled approach, are combined to facilitate feature learning and improve end-to-end pose estimation by regression.
arXiv Detail & Related papers (2021-08-23T16:48:58Z) - Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout
for Landmark-based Facial Expression Recognition with Uncertainty Estimation [93.73198973454944]
The performance of our method is evaluated on three widely used datasets.
It is comparable to that of video-based state-of-the-art methods while it has much less complexity.
arXiv Detail & Related papers (2021-06-08T13:40:30Z) - A Study On the Effects of Pre-processing On Spatio-temporal Action
Recognition Using Spiking Neural Networks Trained with STDP [0.0]
It is important to study the behavior of SNNs trained with unsupervised learning methods on video classification tasks.
This paper presents methods of transposing temporal information into a static format, and then transforming the visual information into spikes using latency coding.
We show the effect of the similarity in the shape and speed of certain actions on action recognition with spiking neural networks.
arXiv Detail & Related papers (2021-05-31T07:07:48Z) - Group-Skeleton-Based Human Action Recognition in Complex Events [15.649778891665468]
We propose a novel group-skeleton-based human action recognition method in complex events.
This method first utilizes multi-scale spatial-temporal graph convolutional networks (MS-G3Ds) to extract skeleton features from multiple persons.
Results on the HiEve dataset show that our method can give superior performance compared to other state-of-the-art methods.
arXiv Detail & Related papers (2020-11-26T13:19:14Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z) - Parallelization Techniques for Verifying Neural Networks [52.917845265248744]
We introduce an algorithm based on the verification problem in an iterative manner and explore two partitioning strategies.
We also introduce a highly parallelizable pre-processing algorithm that uses the neuron activation phases to simplify the neural network verification problems.
arXiv Detail & Related papers (2020-04-17T20:21:47Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.