MGPSN: Motion-Guided Pseudo Siamese Network for Indoor Video Head
Detection
- URL: http://arxiv.org/abs/2110.03302v1
- Date: Thu, 7 Oct 2021 09:40:22 GMT
- Title: MGPSN: Motion-Guided Pseudo Siamese Network for Indoor Video Head
Detection
- Authors: Kailai Sun, Xiaoteng Ma, Qianchuan Zhao, Peng Liu
- Abstract summary: We propose Motion-Guided Pseudo Siamese Network for Indoor Video Head Detection (MGPSN) to learn the robust head motion features.
MGPSN integrates spatial-temporal information on pixel level, guiding the model to extract effective head features.
It achieves state-of-the-art performance on the crowd Brainwash dataset.
- Score: 6.061552465738301
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Head detection in real-world videos is an important research topic in
computer vision. However, existing studies face some challenges in complex
scenes. The performance of head detectors deteriorates when objects which have
similar head appearance exist for indoor videos. Moreover, heads have small
scales and diverse poses, which increases the difficulty in detection. To
handle these issues, we propose Motion-Guided Pseudo Siamese Network for Indoor
Video Head Detection (MGPSN), an end-to-end model to learn the robust head
motion features. MGPSN integrates spatial-temporal information on pixel level,
guiding the model to extract effective head features. Experiments show that
MGPSN is able to suppress static objects and enhance motion instances. Compared
with previous methods, it achieves state-of-the-art performance on the crowd
Brainwash dataset. Different backbone networks and detectors are evaluated to
verify the flexibility and generality of MGPSN.
Related papers
- DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations [7.096701481970196]
Head-Mounted Devices (HMDs) typically only provide a few input signals, such as head and hands 6-DoF.
We propose the first unified approach, HMD-NeMo, that addresses plausible and accurate full body motion generation even when the hands may be only partially visible.
arXiv Detail & Related papers (2023-08-22T08:07:12Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - Application Of ADNN For Background Subtraction In Smart Surveillance
System [0.0]
We develop an intelligent video surveillance system that uses ADNN architecture for motion detection, trims the video with parts only containing motion, and performs anomaly detection on the trimmed video.
arXiv Detail & Related papers (2022-12-31T18:42:11Z) - Improving Unsupervised Video Object Segmentation with Motion-Appearance
Synergy [52.03068246508119]
We present IMAS, a method that segments the primary objects in videos without manual annotation in training or inference.
IMAS achieves Improved UVOS with Motion-Appearance Synergy.
We demonstrate its effectiveness in tuning critical hyperparams previously tuned with human annotation or hand-crafted hyperparam-specific metrics.
arXiv Detail & Related papers (2022-12-17T06:47:30Z) - Adversarially Robust Video Perception by Seeing Motion [29.814393563282753]
We find one reason for video models' vulnerability is that they fail to perceive the correct motion under adversarial perturbations.
Inspired by the extensive evidence that motion is a key factor for the human visual system, we propose to correct what the model sees by restoring the perceived motion information.
Our work provides new insight into robust video perception algorithms by using intrinsic structures from the data.
arXiv Detail & Related papers (2022-12-13T02:25:33Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - The Right Spin: Learning Object Motion from Rotation-Compensated Flow
Fields [61.664963331203666]
How humans perceive moving objects is a longstanding research question in computer vision.
One approach to the problem is to teach a deep network to model all of these effects.
We present a novel probabilistic model to estimate the camera's rotation given the motion field.
arXiv Detail & Related papers (2022-02-28T22:05:09Z) - Activity Recognition with Moving Cameras and Few Training Examples:
Applications for Detection of Autism-Related Headbanging [1.603589863010401]
Activity recognition computer vision algorithms can be used to detect the presence of autism-related behaviors.
We document the advantages and limitations of current feature representation techniques for activity recognition when applied to head banging detection.
We create a computer vision classifier for detecting head banging in home videos using a time-distributed convolutional neural network.
arXiv Detail & Related papers (2021-01-10T05:37:05Z) - UNOC: Understanding Occlusion for Embodied Presence in Virtual Reality [12.349749717823736]
In this paper, we propose a new data-driven framework for inside-out body tracking.
We first collect a large-scale motion capture dataset with both body and finger motions.
We then simulate the occlusion patterns in head-mounted camera views on the captured ground truth using a ray casting algorithm and learn a deep neural network to infer the occluded body parts.
arXiv Detail & Related papers (2020-11-12T09:31:09Z) - Kinematic 3D Object Detection in Monocular Video [123.7119180923524]
We propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization.
We achieve state-of-the-art performance on monocular 3D object detection and the Bird's Eye View tasks within the KITTI self-driving dataset.
arXiv Detail & Related papers (2020-07-19T01:15:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.