Image-based Visibility Analysis Replacing Line-of-Sight Simulation: An Urban Landmark Perspective
- URL: http://arxiv.org/abs/2505.11809v1
- Date: Sat, 17 May 2025 03:41:45 GMT
- Title: Image-based Visibility Analysis Replacing Line-of-Sight Simulation: An Urban Landmark Perspective
- Authors: Zicheng Fan, Kunihiko Fujiwara, Pengyuan Liu, Fan Zhang, Filip Biljecki,
- Abstract summary: The study challenges the traditional LoS-based approaches by introducing a new, image-based visibility analysis method.<n>In the first case study, the method proves its reliability in detecting the visibility of six tall landmark constructions in global cities, with an overall accuracy of 87%.<n>In the second case, the proposed visibility graph uncovers the form and strength of connections for multiple landmarks along the River Thames in London.
- Score: 2.3315115235829342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visibility analysis is one of the fundamental analytics methods in urban planning and landscape research, traditionally conducted through computational simulations based on the Line-of-Sight (LoS) principle. However, when assessing the visibility of named urban objects such as landmarks, geometric intersection alone fails to capture the contextual and perceptual dimensions of visibility as experienced in the real world. The study challenges the traditional LoS-based approaches by introducing a new, image-based visibility analysis method. Specifically, a Vision Language Model (VLM) is applied to detect the target object within a direction-zoomed Street View Image (SVI). Successful detection represents the object's visibility at the corresponding SVI location. Further, a heterogeneous visibility graph is constructed to address the complex interaction between observers and target objects. In the first case study, the method proves its reliability in detecting the visibility of six tall landmark constructions in global cities, with an overall accuracy of 87%. Furthermore, it reveals broader contextual differences when the landmarks are perceived and experienced. In the second case, the proposed visibility graph uncovers the form and strength of connections for multiple landmarks along the River Thames in London, as well as the places where these connections occur. Notably, bridges on the River Thames account for approximately 30% of total connections. Our method complements and enhances traditional LoS-based visibility analysis, and showcases the possibility of revealing the prevalent connection of any visual objects in the urban environment. It opens up new research perspectives for urban planning, heritage conservation, and computational social science.
Related papers
- Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation [54.04601077224252]
Embodied scene understanding requires not only comprehending visual-spatial information but also determining where to explore next in the 3D physical world.<n>underlinetextbf3D vision-language learning enables embodied agents to effectively explore and understand their environment.<n>model's versatility enables navigation using diverse input modalities, including categories, language descriptions, and reference images.
arXiv Detail & Related papers (2025-07-05T14:15:52Z) - Interpretable Multimodal Framework for Human-Centered Street Assessment: Integrating Visual-Language Models for Perceptual Urban Diagnostics [0.0]
This study introduces a novel Multimodal Street Evaluation Framework (MSEF)<n>We fine-tune the framework using LoRA and P-Tuning v2 for parameter-efficient adaptation.<n>The model achieves an F1 score of 0.84 on objective features and 89.3 percent agreement with aggregated resident perceptions.
arXiv Detail & Related papers (2025-06-05T14:34:04Z) - Urban Visibility Hotspots: Quantifying Building Vertex Visibility from Connected Vehicle Trajectories using Spatial Indexing [3.4760227640914416]
This research introduces a data-driven methodology to objectively quantify location visibility.<n>We model the dynamic driver field-of-view using a forward-projected visibility area for each vehicle position.<n>We quantify the cumulative visual exposure, or visibility count'', for thousands of potential points of interest near roadways.
arXiv Detail & Related papers (2025-06-03T20:16:41Z) - Coverage and Bias of Street View Imagery in Mapping the Urban Environment [0.0]
Street View Imagery (SVI) has emerged as a valuable data form in urban studies, enabling new ways to map and sense urban environments.<n>However, fundamental concerns regarding the representativeness, quality, and reliability of SVI remain underexplored.<n>This research proposes a novel and effective method to estimate SVI's element-level coverage in the urban environment.
arXiv Detail & Related papers (2024-09-22T02:58:43Z) - ELEV-VISION-SAM: Integrated Vision Language and Foundation Model for Automated Estimation of Building Lowest Floor Elevation [1.2070884166650049]
This study integrates the Segment Anything model, a segmentation foundation model, with vision language models to conduct text-prompt image segmentation on street view images for LFE estimation.
Our proposed method significantly enhances the availability of LFE estimation to almost all properties in which the front door is visible in the street view image.
arXiv Detail & Related papers (2024-04-19T03:16:08Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - How To Not Train Your Dragon: Training-free Embodied Object Goal
Navigation with Semantic Frontiers [94.46825166907831]
We present a training-free solution to tackle the object goal navigation problem in Embodied AI.
Our method builds a structured scene representation based on the classic visual simultaneous localization and mapping (V-SLAM) framework.
Our method propagates semantics on the scene graphs based on language priors and scene statistics to introduce semantic knowledge to the geometric frontiers.
arXiv Detail & Related papers (2023-05-26T13:38:33Z) - Point-Level Region Contrast for Object Detection Pre-Training [147.47349344401806]
We present point-level region contrast, a self-supervised pre-training approach for the task of object detection.
Our approach performs contrastive learning by directly sampling individual point pairs from different regions.
Compared to an aggregated representation per region, our approach is more robust to the change in input region quality.
arXiv Detail & Related papers (2022-02-09T18:56:41Z) - Adding Visibility to Visibility Graphs: Weighting Visibility Analysis
with Attenuation Coefficients [0.0]
This paper introduces a new method for weighting a visibility graph based on weather conditions.
New factors are integrated into visibility graphs and applied to sample environments to demonstrate the variance between assuming a straight line of sight and reduced visibility.
arXiv Detail & Related papers (2021-07-28T18:36:56Z) - Bayesian Eye Tracking [63.21413628808946]
Model-based eye tracking is susceptible to eye feature detection errors.
We propose a Bayesian framework for model-based eye tracking.
Compared to state-of-the-art model-based and learning-based methods, the proposed framework demonstrates significant improvement in generalization capability.
arXiv Detail & Related papers (2021-06-25T02:08:03Z) - City-Scale Visual Place Recognition with Deep Local Features Based on
Multi-Scale Ordered VLAD Pooling [5.274399407597545]
We present a fully-automated system for place recognition at a city-scale based on content-based image retrieval.
Firstly, we take a comprehensive analysis of visual place recognition and sketch out the unique challenges of the task.
Next, we propose yet a simple pooling approach on top of convolutional neural network activations to embed the spatial information into the image representation vector.
arXiv Detail & Related papers (2020-09-19T15:21:59Z) - 3D Shape Reconstruction from Vision and Touch [62.59044232597045]
In 3D shape reconstruction, the complementary fusion of visual and haptic modalities remains largely unexplored.
We introduce a dataset of simulated touch and vision signals from the interaction between a robotic hand and a large array of 3D objects.
arXiv Detail & Related papers (2020-07-07T20:20:33Z) - Neural Topological SLAM for Visual Navigation [112.73876869904]
We design topological representations for space that leverage semantics and afford approximate geometric reasoning.
We describe supervised learning-based algorithms that can build, maintain and use such representations under noisy actuation.
arXiv Detail & Related papers (2020-05-25T17:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.