Mutual-Supervised Feature Modulation Network for Occluded Pedestrian
Detection
- URL: http://arxiv.org/abs/2010.10744v1
- Date: Wed, 21 Oct 2020 03:42:22 GMT
- Title: Mutual-Supervised Feature Modulation Network for Occluded Pedestrian
Detection
- Authors: Ye He, Chao Zhu, Xu-Cheng Yin
- Abstract summary: We propose a novel Mutual-Supervised Feature Modulation (MSFM) network to better handle occluded pedestrian detection.
MSFM module calculates the similarity loss of full body boxes and visible body boxes corresponding to the same pedestrian.
Our approach achieves superior performance compared to other state-of-the-art methods on two challenging pedestrian datasets.
- Score: 10.497367073305806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art pedestrian detectors have achieved significant progress on
non-occluded pedestrians, yet they are still struggling under heavy occlusions.
The recent occlusion handling strategy of popular two-stage approaches is to
build a two-branch architecture with the help of additional visible body
annotations. Nonetheless, these methods still have some weaknesses. Either the
two branches are trained independently with only score-level fusion, which
cannot guarantee the detectors to learn robust enough pedestrian features. Or
the attention mechanisms are exploited to only emphasize on the visible body
features. However, the visible body features of heavily occluded pedestrians
are concentrated on a relatively small area, which will easily cause missing
detections. To address the above issues, we propose in this paper a novel
Mutual-Supervised Feature Modulation (MSFM) network, to better handle occluded
pedestrian detection. The key MSFM module in our network calculates the
similarity loss of full body boxes and visible body boxes corresponding to the
same pedestrian so that the full-body detector could learn more complete and
robust pedestrian features with the assist of contextual features from the
occluding parts. To facilitate the MSFM module, we also propose a novel
two-branch architecture, consisting of a standard full body detection branch
and an extra visible body classification branch. These two branches are trained
in a mutual-supervised way with full body annotations and visible body
annotations, respectively. To verify the effectiveness of our proposed method,
extensive experiments are conducted on two challenging pedestrian datasets:
Caltech and CityPersons, and our approach achieves superior performance
compared to other state-of-the-art methods on both datasets, especially in
heavy occlusion case.
Related papers
- Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Cascaded information enhancement and cross-modal attention feature
fusion for multispectral pedestrian detection [6.167053377021009]
We propose a multispectral pedestrian detection algorithm, which mainly consists of a cascaded information enhancement module and a cross-modal attention feature fusion module.
Our method demonstrates a lower pedestrian miss rate and more accurate pedestrian detection boxes compared to the comparison method.
arXiv Detail & Related papers (2023-02-17T03:30:00Z) - Feature Calibration Network for Occluded Pedestrian Detection [137.37275165635882]
We propose a novel feature learning method in the deep learning framework, referred to as Feature Network (FC-Net)
FC-Net is based on the observation that the visible parts of pedestrians are selective and decisive for detection.
Experiments on CityPersons and Caltech datasets demonstrate that FC-Net improves detection performance on occluded pedestrians up to 10%.
arXiv Detail & Related papers (2022-12-12T05:48:34Z) - ReDFeat: Recoupling Detection and Description for Multimodal Feature
Learning [51.07496081296863]
We recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy.
We propose a detector that possesses a large receptive field and is equipped with learnable non-maximum suppression layers.
We build a benchmark that contains cross visible, infrared, near-infrared and synthetic aperture radar image pairs for evaluating the performance of features in feature matching and image registration tasks.
arXiv Detail & Related papers (2022-05-16T04:24:22Z) - An Objective Method for Pedestrian Occlusion Level Classification [6.125017875330933]
Occlusion level classification is achieved through the identification of visible pedestrian keypoints and through the use of a novel, effective method of 2D body surface area estimation.
Experimental results demonstrate that the proposed method reflects the pixel-wise.
occlusion level of pedestrians in images and is effective for all forms of.
occlusion, including challenging edge cases such as self-occlusion, truncation.
and inter-occluding pedestrians.
arXiv Detail & Related papers (2022-05-11T11:27:41Z) - Disentangle Saliency Detection into Cascaded Detail Modeling and Body
Filling [68.73040261040539]
We propose to decompose the saliency detection task into two cascaded sub-tasks, emphi.e., detail modeling and body filling.
Specifically, the detail modeling focuses on capturing the object edges by supervision of explicitly decomposed detail label.
The body filling learns the body part which will be filled into the detail map to generate more accurate saliency map.
arXiv Detail & Related papers (2022-02-08T19:33:02Z) - Multi-attentional Deepfake Detection [79.80308897734491]
Face forgery by deepfake is widely spread over the internet and has raised severe societal concerns.
We propose a new multi-attentional deepfake detection network. Specifically, it consists of three key components: 1) multiple spatial attention heads to make the network attend to different local parts; 2) textural feature enhancement block to zoom in the subtle artifacts in shallow features; 3) aggregate the low-level textural feature and high-level semantic features guided by the attention maps.
arXiv Detail & Related papers (2021-03-03T13:56:14Z) - Visible Feature Guidance for Crowd Pedestrian Detection [12.8128512764041]
We propose Visible Feature Guidance (VFG) for both training and inference.
During training, we adopt visible feature to regress the simultaneous outputs of visible bounding box and full bounding box.
Then we perform NMS only on visible bounding boxes to achieve the best fitting full box in inference.
arXiv Detail & Related papers (2020-08-23T08:52:52Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - NMS by Representative Region: Towards Crowded Pedestrian Detection by
Proposal Pairing [25.050500817717108]
The heavy occlusion between pedestrians imposes great challenges to the standard Non-Maximum Suppression (NMS)
This paper proposes a novel Representative Region NMS approach leveraging the less occluded visible parts, effectively removing the redundant boxes without bringing in many false positives.
Experiments on the challenging CrowdHuman and CityPersons benchmarks sufficiently validate the effectiveness of the proposed approach on pedestrian detection in the crowded situation.
arXiv Detail & Related papers (2020-03-28T06:33:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.