Panoramic Distortion-Aware Tokenization for Person Detection and Localization Using Transformers in Overhead Fisheye Images
- URL: http://arxiv.org/abs/2503.14228v1
- Date: Tue, 18 Mar 2025 13:05:41 GMT
- Title: Panoramic Distortion-Aware Tokenization for Person Detection and Localization Using Transformers in Overhead Fisheye Images
- Authors: Nobuhiko Wakai, Satoshi Sato, Yasunori Ishii, Takayoshi Yamashita,
- Abstract summary: Person detection is an open challenge because of factors including person rotation and small-sized persons.<n>We convert fisheye images into panoramic images using panoramic distortion-aware tokenization.<n>We propose a person detection and localization method that combines panoramic-image remapping and the tokenization procedure.
- Score: 9.018416031676136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Person detection methods are used widely in applications including visual surveillance, pedestrian detection, and robotics. However, accurate detection of persons from overhead fisheye images remains an open challenge because of factors including person rotation and small-sized persons. To address the person rotation problem, we convert the fisheye images into panoramic images. For smaller people, we focused on the geometry of the panoramas. Conventional detection methods tend to focus on larger people because these larger people yield large significant areas for feature maps. In equirectangular panoramic images, we find that a person's height decreases linearly near the top of the images. Using this finding, we leverage the significance values and aggregate tokens that are sorted based on these values to balance the significant areas. In this leveraging process, we introduce panoramic distortion-aware tokenization. This tokenization procedure divides a panoramic image using self-similarity figures that enable determination of optimal divisions without gaps, and we leverage the maximum significant values in each tile of token groups to preserve the significant areas of smaller people. To achieve higher detection accuracy, we propose a person detection and localization method that combines panoramic-image remapping and the tokenization procedure. Extensive experiments demonstrated that our method outperforms conventional methods when applied to large-scale datasets.
Related papers
- Enhancing people localisation in drone imagery for better crowd management by utilising every pixel in high-resolution images [0.0]
A novel approach dedicated to point-oriented object localisation is proposed.<n>The Pixel Distill module is introduced to enhance the processing of high-definition images.<n>A new dataset named UP-COUNT is tailored to contemporary drone applications.
arXiv Detail & Related papers (2025-02-06T12:16:22Z) - RoFIR: Robust Fisheye Image Rectification Framework Impervious to Optical Center Deviation [88.54817424560056]
We propose a distortion vector map (DVM) that measures the degree and direction of local distortion.
By learning the DVM, the model can independently identify local distortions at each pixel without relying on global distortion patterns.
In the pre-training stage, it predicts the distortion vector map and perceives the local distortion features of each pixel.
In the fine-tuning stage, it predicts a pixel-wise flow map for deviated fisheye image rectification.
arXiv Detail & Related papers (2024-06-27T06:38:56Z) - Large-Scale Person Detection and Localization using Overhead Fisheye
Cameras [40.004888590123954]
We present the first large-scale overhead fisheye dataset for person detection and localization.
We build a fisheye person detection network, which exploits the fisheye distortions by a rotation-equivariant training strategy.
Our whole fisheye positioning solution is able to locate all persons in FOV with an accuracy of 0.5 m, within 0.1 s.
arXiv Detail & Related papers (2023-07-17T05:36:01Z) - Self-similarity Driven Scale-invariant Learning for Weakly Supervised
Person Search [66.95134080902717]
We propose a novel one-step framework, named Self-similarity driven Scale-invariant Learning (SSL)
We introduce a Multi-scale Exemplar Branch to guide the network in concentrating on the foreground and learning scale-invariant features.
Experiments on PRW and CUHK-SYSU databases demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-02-25T04:48:11Z) - Parallax-Tolerant Unsupervised Deep Image Stitching [57.76737888499145]
We propose UDIS++, a parallax-tolerant unsupervised deep image stitching technique.
First, we propose a robust and flexible warp to model the image registration from global homography to local thin-plate spline motion.
To further eliminate the parallax artifacts, we propose to composite the stitched image seamlessly by unsupervised learning for seam-driven composition masks.
arXiv Detail & Related papers (2023-02-16T10:40:55Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - ARPD: Anchor-free Rotation-aware People Detection using Topview Fisheye
Camera [3.0868856870169625]
We propose ARPD, a single-stage anchor-free fully convolutional network to detect arbitrarily rotated people in fish-eye images.
Our method competes favorably with state-of-the-art algorithms while running significantly faster.
arXiv Detail & Related papers (2022-01-25T05:49:50Z) - Efficient Pedestrian Detection in Top-View Fisheye Images Using
Compositions of Perspective View Patches [3.5706999675827413]
Existing detectors designed for perspective images do not perform as successfully on images taken with top-view fisheye cameras.
In our proposed approach, several perspective views are generated from a fisheye image and then form a composite image.
As pedestrians in this composite image are more likely to be upright, existing detectors designed and trained for perspective images can be applied directly without additional training.
The detection performance on several public datasets compare favorably with state-of-the-art results.
arXiv Detail & Related papers (2020-09-06T11:19:00Z) - Rethinking of the Image Salient Object Detection: Object-level Semantic
Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter [62.26677215668959]
We propose a lightweight, weakly supervised deep network to coarsely locate semantically salient regions.
We then fuse multiple off-the-shelf deep models on these semantically salient regions as the pixel-wise saliency refinement.
Our method is simple yet effective, which is the first attempt to consider the salient object detection mainly as an object-level semantic re-ranking problem.
arXiv Detail & Related papers (2020-08-10T07:12:43Z) - RAPiD: Rotation-Aware People Detection in Overhead Fisheye Images [13.290341167863495]
We develop an end-to-end-aware people detection method, named RAPiD, that detects people using arbitrarily-oriented bounding boxes.
Our fully-convolutional neural network directly regresses the angle of each bounding box using a periodic loss rotation function.
We show that our simple, yet effective method outperforms state-of-the-art results on three fisheye-image datasets.
arXiv Detail & Related papers (2020-05-23T23:47:18Z) - Learning to Detect Important People in Unlabelled Images for
Semi-supervised Important People Detection [85.91577271918783]
We propose learning important people detection on partially annotated images.
Our approach iteratively learns to assign pseudo-labels to individuals in un-annotated images.
We have collected two large-scale datasets for evaluation.
arXiv Detail & Related papers (2020-04-16T10:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.