Enhancing people localisation in drone imagery for better crowd management by utilising every pixel in high-resolution images
- URL: http://arxiv.org/abs/2502.04014v1
- Date: Thu, 06 Feb 2025 12:16:22 GMT
- Title: Enhancing people localisation in drone imagery for better crowd management by utilising every pixel in high-resolution images
- Authors: Bartosz Ptak, Marek Kraft,
- Abstract summary: A novel approach dedicated to point-oriented object localisation is proposed.<n>The Pixel Distill module is introduced to enhance the processing of high-definition images.<n>A new dataset named UP-COUNT is tailored to contemporary drone applications.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Accurate people localisation using drones is crucial for effective crowd management, not only during massive events and public gatherings but also for monitoring daily urban crowd flow. Traditional methods for tiny object localisation using high-resolution drone imagery often face limitations in precision and efficiency, primarily due to constraints in image scaling and sliding window techniques. To address these challenges, a novel approach dedicated to point-oriented object localisation is proposed. Along with this approach, the Pixel Distill module is introduced to enhance the processing of high-definition images by extracting spatial information from individual pixels at once. Additionally, a new dataset named UP-COUNT, tailored to contemporary drone applications, is shared. It addresses a wide range of challenges in drone imagery, such as simultaneous camera and object movement during the image acquisition process, pushing forward the capabilities of crowd management applications. A comprehensive evaluation of the proposed method on the proposed dataset and the commonly used DroneCrowd dataset demonstrates the superiority of our approach over existing methods and highlights its efficacy in drone-based crowd object localisation tasks. These improvements markedly increase the algorithm's applicability to operate in real-world scenarios, enabling more reliable localisation and counting of individuals in dynamic environments.
Related papers
- DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding [25.32283897448209]
DynamicVis is a dynamic visual perception foundation model for remote sensing imagery.
It integrates a novel dynamic region perception backbone based on the selective state space model.
It achieves multi-level feature modeling with exceptional efficiency, processing (2048x2048) pixels with 97 ms latency (6% of ViT's) and 833 MB GPU memory (3% of ViT's)
arXiv Detail & Related papers (2025-03-20T17:59:54Z) - Multi-Grained Feature Pruning for Video-Based Human Pose Estimation [19.297490509277463]
We propose a novel multi-scale resolution framework for human pose estimation.
We employ a density clustering method to identify tokens that offer important semantic information.
Our method achieves a 93.8% improvement in inference speed compared to the baseline.
arXiv Detail & Related papers (2025-03-07T12:14:51Z) - A Cross-Scene Benchmark for Open-World Drone Active Tracking [54.235808061746525]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.<n>We propose a unified cross-scene cross-domain benchmark for open-world drone active tracking called DAT.<n>We also propose a reinforcement learning-based drone tracking method called R-VAT.
arXiv Detail & Related papers (2024-12-01T09:37:46Z) - Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders [6.498925999634298]
This paper presents a novel approach for communication-efficient distributed multiview detection and tracking using masked autoencoders (MAEs)
We introduce a semantic-guided masking strategy that leverages pre-trained segmentation models and a tunable power function to prioritize informative image regions.
We evaluate our method on both virtual and real-world multiview datasets, demonstrating comparable performance in terms of detection and tracking performance metrics.
arXiv Detail & Related papers (2024-10-07T08:06:41Z) - Resource Efficient Perception for Vision Systems [0.0]
Our study introduces a framework aimed at mitigating these challenges by leveraging memory efficient patch based processing for high resolution images.
It incorporates a global context representation alongside local patch information, enabling a comprehensive understanding of the image content.
We demonstrate the effectiveness of our method through superior performance on 7 different benchmarks across classification, object detection, and segmentation.
arXiv Detail & Related papers (2024-05-12T05:33:00Z) - OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos [14.965321452764355]
We introduce a new approach called Omnidirectional Local Radiance Fields (OmniLocalRF) that can render static-only scene views.
Our approach combines the principles of local radiance fields with the bidirectional optimization of omnidirectional rays.
Our experiments validate that OmniLocalRF outperforms existing methods in both qualitative and quantitative metrics.
arXiv Detail & Related papers (2024-03-31T12:55:05Z) - VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z) - Robust Zero-Shot Crowd Counting and Localization With Adaptive Resolution SAM [55.93697196726016]
We propose a simple yet effective crowd counting method by utilizing the Segment-Everything-Everywhere Model (SEEM)
We show that SEEM's performance in dense crowd scenes is limited, primarily due to the omission of many persons in high-density areas.
Our proposed method achieves the best unsupervised performance in crowd counting, while also being comparable to some supervised methods.
arXiv Detail & Related papers (2024-02-27T13:55:17Z) - Spatially-Attentive Patch-Hierarchical Network with Adaptive Sampling
for Motion Deblurring [34.751361664891235]
We propose a pixel adaptive and feature attentive design for handling large blur variations across different spatial locations.
We show that our approach performs favorably against the state-of-the-art deblurring algorithms.
arXiv Detail & Related papers (2024-02-09T01:00:09Z) - Generalizing Event-Based Motion Deblurring in Real-World Scenarios [62.995994797897424]
Event-based motion deblurring has shown promising results by exploiting low-latency events.
We propose a scale-aware network that allows flexible input spatial scales and enables learning from different temporal scales of motion blur.
A two-stage self-supervised learning scheme is then developed to fit real-world data distribution.
arXiv Detail & Related papers (2023-08-11T04:27:29Z) - Estimating Egocentric 3D Human Pose in Global Space [70.7272154474722]
We present a new method for egocentric global 3D body pose estimation using a single-mounted fisheye camera.
Our approach outperforms state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-04-27T20:01:57Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.