TopView: Vectorising road users in a bird's eye view from uncalibrated street-level imagery with deep learning
- URL: http://arxiv.org/abs/2412.16229v1
- Date: Wed, 18 Dec 2024 21:55:58 GMT
- Title: TopView: Vectorising road users in a bird's eye view from uncalibrated street-level imagery with deep learning
- Authors: Mohamed R Ibrahim,
- Abstract summary: We introduce a simple approach for estimating a bird's eye view from images without prior knowledge of a given camera's intrinsic and extrinsic parameters.
The framework has been applied to several applications to generate a live Map from camera feeds and to analyse social distancing violations at the city scale.
- Score: 2.7195102129095003
- License:
- Abstract: Generating a bird's eye view of road users is beneficial for a variety of applications, including navigation, detecting agent conflicts, and measuring space occupancy, as well as the ability to utilise the metric system to measure distances between different objects. In this research, we introduce a simple approach for estimating a bird's eye view from images without prior knowledge of a given camera's intrinsic and extrinsic parameters. The model is based on the orthogonal projection of objects from various fields of view to a bird's eye view by learning the vanishing point of a given scene. Additionally, we utilised the learned vanishing point alongside the trajectory line to transform the 2D bounding boxes of road users into 3D bounding information. The introduced framework has been applied to several applications to generate a live Map from camera feeds and to analyse social distancing violations at the city scale. The introduced framework shows a high validation in geolocating road users in various uncalibrated cameras. It also paves the way for new adaptations in urban modelling techniques and simulating the built environment accurately, which could benefit Agent-Based Modelling by relying on deep learning and computer vision.
Related papers
- MC-BEVRO: Multi-Camera Bird Eye View Road Occupancy Detection for Traffic Monitoring [23.396192711865147]
Single camera 3D perception for traffic monitoring faces significant challenges due to occlusion and limited field of view.
This paper introduces a novel Bird's-Eye-View road occupancy detection framework that leverages multiple roadside cameras.
arXiv Detail & Related papers (2025-02-16T22:03:03Z) - Visualizing Routes with AI-Discovered Street-View Patterns [4.153397474276339]
We propose a solution of using semantic latent vectors for quantifying visual appearance features.
We calculate image similarities among a large set of street-view images and then discover spatial imagery patterns.
We present VivaRoutes, an interactive visualization prototype, to show how visualizations leveraged with these discovered patterns can help users effectively and interactively explore multiple routes.
arXiv Detail & Related papers (2024-03-30T17:32:26Z) - Towards Generalizable Multi-Camera 3D Object Detection via Perspective
Debiasing [28.874014617259935]
Multi-Camera 3D Object Detection (MC3D-Det) has gained prominence with the advent of bird's-eye view (BEV) approaches.
We propose a novel method that aligns 3D detection with 2D camera plane results, ensuring consistent and accurate detections.
arXiv Detail & Related papers (2023-10-17T15:31:28Z) - Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years.
Data-driven simulation for autonomous driving has been a focal point of recent research.
We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z) - Monocular BEV Perception of Road Scenes via Front-to-Top View Projection [57.19891435386843]
We present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view.
Our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
arXiv Detail & Related papers (2022-11-15T13:52:41Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Weak Multi-View Supervision for Surface Mapping Estimation [0.9367260794056769]
We propose a weakly-supervised multi-view learning approach to learn category-specific surface mapping without dense annotations.
We learn the underlying surface geometry of common categories, such as human faces, cars, and airplanes, given instances from those categories.
arXiv Detail & Related papers (2021-05-04T09:46:26Z) - Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by
Implicitly Unprojecting to 3D [100.93808824091258]
We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras.
Our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a bird's-eye-view grid.
We show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network.
arXiv Detail & Related papers (2020-08-13T06:29:01Z) - Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground.
Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights.
We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - Predicting Semantic Map Representations from Images using Pyramid
Occupancy Networks [27.86228863466213]
We present a simple, unified approach for estimating maps directly from monocular images using a single end-to-end deep learning architecture.
We demonstrate the effectiveness of our approach by evaluating against several challenging baselines on the NuScenes and Argoverse datasets.
arXiv Detail & Related papers (2020-03-30T12:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.