OmniPD: One-Step Person Detection in Top-View Omnidirectional Indoor
Scenes
- URL: http://arxiv.org/abs/2204.06846v1
- Date: Thu, 14 Apr 2022 09:41:53 GMT
- Title: OmniPD: One-Step Person Detection in Top-View Omnidirectional Indoor
Scenes
- Authors: Jingrui Yu, Roman Seidel, Gangolf Hirtz
- Abstract summary: We propose a one-step person detector for topview omnidirectional indoor scenes based on convolutional neural networks (CNNs)
The method predicts bounding boxes of multiple persons directly in omnidirectional images without perspective transformation.
Our method is applicable to other CNN-based object detectors and can potentially generalize for detecting other objects in omnidirectional images.
- Score: 4.297070083645049
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a one-step person detector for topview omnidirectional indoor
scenes based on convolutional neural networks (CNNs). While state of the art
person detectors reach competitive results on perspective images, missing CNN
architectures as well as training data that follows the distortion of
omnidirectional images makes current approaches not applicable to our data. The
method predicts bounding boxes of multiple persons directly in omnidirectional
images without perspective transformation, which reduces overhead of pre- and
post-processing and enables real-time performance. The basic idea is to utilize
transfer learning to fine-tune CNNs trained on perspective images with data
augmentation techniques for detection in omnidirectional images. We fine-tune
two variants of Single Shot MultiBox detectors (SSDs). The first one uses
Mobilenet v1 FPN as feature extractor (moSSD). The second one uses ResNet50 v1
FPN (resSSD). Both models are pre-trained on Microsoft Common Objects in
Context (COCO) dataset. We fine-tune both models on PASCAL VOC07 and VOC12
datasets, specifically on class person. Random 90-degree rotation and random
vertical flipping are used for data augmentation in addition to the methods
proposed by original SSD. We reach an average precision (AP) of 67.3 % with
moSSD and 74.9 % with resSSD onthe evaluation dataset. To enhance the
fine-tuning process, we add a subset of HDA Person dataset and a subset of
PIROPOdatabase and reduce the number of perspective images to PASCAL VOC07. The
AP rises to 83.2 % for moSSD and 86.3 % for resSSD, respectively. The average
inference speed is 28 ms per image for moSSD and 38 ms per image for resSSD
using Nvidia Quadro P6000. Our method is applicable to other CNN-based object
detectors and can potentially generalize for detecting other objects in
omnidirectional images.
Related papers
- Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images.
We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - M&M3D: Multi-Dataset Training and Efficient Network for Multi-view 3D
Object Detection [2.5158048364984564]
I proposed a network structure for multi-view 3D object detection using camera-only data and a Bird's-Eye-View map.
My work is based on a current key challenge domain adaptation and visual data transfer.
My study utilizes 3D information as available semantic information and 2D multi-view image features blending into the visual-language transfer design.
arXiv Detail & Related papers (2023-11-02T04:28:51Z) - Randomize to Generalize: Domain Randomization for Runway FOD Detection [1.4249472316161877]
Tiny Object Detection is challenging due to small size, low resolution, occlusion, background clutter, lighting conditions and small object-to-image ratio.
We propose a novel two-stage methodology Synthetic Image Augmentation (SRIA) to enhance generalization capabilities of models encountering 2D datasets.
We report that detection accuracy improved from an initial 41% to 92% for OOD test set.
arXiv Detail & Related papers (2023-09-23T05:02:31Z) - Human Pose Estimation in Monocular Omnidirectional Top-View Images [3.07869141026886]
We propose a new dataset for training and evaluation of CNNs for the task of keypoint detection in omnidirectional images.
The training dataset, THEODORE+, consists of 50,000 images and is created by a 3D rendering engine.
For evaluation purposes, the real-world PoseFES dataset with two scenarios and 701 frames with up to eight persons per scene was captured and annotated.
arXiv Detail & Related papers (2023-04-17T11:52:04Z) - Collaboration Helps Camera Overtake LiDAR in 3D Detection [49.58433319402405]
Camera-only 3D detection provides a simple solution for localizing objects in 3D space compared to LiDAR-based detection systems.
Our proposed collaborative camera-only 3D detection (CoCa3D) enables agents to share complementary information with each other through communication.
Results show that CoCa3D improves previous SOTA performances by 44.21% on DAIR-V2X, 30.60% on OPV2V+, 12.59% on CoPerception-UAVs+ for AP@70.
arXiv Detail & Related papers (2023-03-23T03:50:41Z) - VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and
Stereo Data Fusion [62.24001258298076]
VPFNet is a new architecture that cleverly aligns and aggregates the point cloud and image data at the virtual' points.
Our VPFNet achieves 83.21% moderate 3D AP and 91.86% moderate BEV AP on the KITTI test set, ranking the 1st since May 21th, 2021.
arXiv Detail & Related papers (2021-11-29T08:51:20Z) - DeepDarts: Modeling Keypoints as Objects for Automatic Scorekeeping in
Darts using a Single Camera [75.34178733070547]
Existing multi-camera solutions for automatic scorekeeping in steel-tip darts are very expensive and thus inaccessible to most players.
We present a new approach to keypoint detection and apply it to predict dart scores from a single image taken from any camera angle.
We develop a deep convolutional neural network around this idea and use it to predict dart locations and dartboard calibration points.
arXiv Detail & Related papers (2021-05-20T16:25:57Z) - Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image.
Our neural network processes a depth image and outputs a likelihood map in image coordinates.
We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z) - Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance
Disparity Estimation [51.17232267143098]
We propose a novel system named Disp R-CNN for 3D object detection from stereo images.
We use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds.
Experiments on the KITTI dataset show that, even when LiDAR ground-truth is not available at training time, Disp R-CNN achieves competitive performance and outperforms previous state-of-the-art methods by 20% in terms of average precision.
arXiv Detail & Related papers (2020-04-07T17:48:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.