RADNet: A Deep Neural Network Model for Robust Perception in Moving
Autonomous Systems
- URL: http://arxiv.org/abs/2205.00364v1
- Date: Sat, 30 Apr 2022 23:14:08 GMT
- Title: RADNet: A Deep Neural Network Model for Robust Perception in Moving
Autonomous Systems
- Authors: Burhan A. Mudassar, Sho Ko, Maojingjing Li, Priyabrata Saha, Saibal
Mukhopadhyay
- Abstract summary: We develop a novel ranking method to rank videos based on the degree of global camera motion.
For the high ranking camera videos we show that the accuracy of action detection is decreased.
We propose an action detection pipeline that is robust to the camera motion effect and verify it empirically.
- Score: 8.706086688708014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interactive autonomous applications require robustness of the perception
engine to artifacts in unconstrained videos. In this paper, we examine the
effect of camera motion on the task of action detection. We develop a novel
ranking method to rank videos based on the degree of global camera motion. For
the high ranking camera videos we show that the accuracy of action detection is
decreased. We propose an action detection pipeline that is robust to the camera
motion effect and verify it empirically. Specifically, we do actor feature
alignment across frames and couple global scene features with local
actor-specific features. We do feature alignment using a novel formulation of
the Spatio-temporal Sampling Network (STSN) but with multi-scale offset
prediction and refinement using a pyramid structure. We also propose a novel
input dependent weighted averaging strategy for fusing local and global
features. We show the applicability of our network on our dataset of moving
camera videos with high camera motion (MOVE dataset) with a 4.1% increase in
frame mAP and 17% increase in video mAP.
Related papers
- MOVIN: Real-time Motion Capture using a Single LiDAR [7.3228874258537875]
We present MOVIN, the data-driven generative method for real-time motion capture with global tracking.
Our framework accurately predicts the performer's 3D global information and local joint details.
We implement a real-time application to showcase our method in real-world scenarios.
arXiv Detail & Related papers (2023-09-17T16:04:15Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action
Recognition [52.78234467516168]
We introduce the concept of patch mutual information (PMI) score to quantify the motion bias between adjacent frames.
We present an adaptive frame selection strategy using shifted leaky ReLu and cumulative distribution function.
Our method achieves a relative improvement of 2.2 - 13.8% in top-1 accuracy on UAV-Human, 6.8% on NEC Drone, and 9.0% on Diving48 datasets.
arXiv Detail & Related papers (2023-04-14T00:01:11Z) - HVC-Net: Unifying Homography, Visibility, and Confidence Learning for
Planar Object Tracking [5.236567998857959]
We present a unified convolutional neural network (CNN) model that jointly considers homography, visibility, and confidence.
Our approach outperforms the state-of-the-art methods on public POT and TMT datasets.
arXiv Detail & Related papers (2022-09-19T11:11:56Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Event and Activity Recognition in Video Surveillance for Cyber-Physical
Systems [0.0]
Long-term motion patterns alone play a pivotal role in the task of recognizing an event.
We show that the long-term motion patterns alone play a pivotal role in the task of recognizing an event.
Only the temporal features are exploited using a hybrid Convolutional Neural Network (CNN) + Recurrent Neural Network (RNN) architecture.
arXiv Detail & Related papers (2021-11-03T08:30:38Z) - Wide and Narrow: Video Prediction from Context and Motion [54.21624227408727]
We propose a new framework to integrate these complementary attributes to predict complex pixel dynamics through deep networks.
We present global context propagation networks that aggregate the non-local neighboring representations to preserve the contextual information over the past frames.
We also devise local filter memory networks that generate adaptive filter kernels by storing the motion of moving objects in the memory.
arXiv Detail & Related papers (2021-10-22T04:35:58Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - TransCamP: Graph Transformer for 6-DoF Camera Pose Estimation [77.09542018140823]
We propose a neural network approach with a graph transformer backbone, namely TransCamP, to address the camera relocalization problem.
TransCamP effectively fuses the image features, camera pose information and inter-frame relative camera motions into encoded graph attributes.
arXiv Detail & Related papers (2021-05-28T19:08:43Z) - Deep Learning for Robust Motion Segmentation with Non-Static Cameras [0.0]
This paper proposes a new end-to-end DCNN based approach for motion segmentation, especially for captured with such non-static cameras, called MOSNET.
While other approaches focus on spatial or temporal context, the proposed approach uses 3D convolutions as a key technology to factor in temporal features in video frames.
The network is able to perform well on scenes captured with non-static cameras where the image content changes significantly during the scene.
arXiv Detail & Related papers (2021-02-22T11:58:41Z) - 0-MMS: Zero-Shot Multi-Motion Segmentation With A Monocular Event Camera [13.39518293550118]
We present an approach for monocular multi-motion segmentation, which combines bottom-up feature tracking and top-down motion compensation into a unified pipeline.
Using the events within a time-interval, our method segments the scene into multiple motions by splitting and merging.
The approach was successfully evaluated on both challenging real-world and synthetic scenarios from the EV-IMO, EED, and MOD datasets.
arXiv Detail & Related papers (2020-06-11T02:34:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.