Related papers: Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery

Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery

URL: http://arxiv.org/abs/2510.27224v1
Date: Fri, 31 Oct 2025 06:37:08 GMT
Title: Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery
Authors: Mahmoud El Hussieni, Bahadır K. Güntürk, Hasan F. Ateş, Oğuz Hanoğlu,
Abstract summary: This paper presents a detailed analysis of YOLOv11, the recent advancement in the YOLO series of deep learning models.<n>YOLOv11 builds on the strengths of earlier YOLO models by introducing a more efficient architecture.<n>We evaluate YOLOv11's performance using metrics such as precision, recall, F1 score, and mean average precision (mAP)
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate building instance segmentation and height classification are critical for urban planning, 3D city modeling, and infrastructure monitoring. This paper presents a detailed analysis of YOLOv11, the recent advancement in the YOLO series of deep learning models, focusing on its application to joint building extraction and discrete height classification from satellite imagery. YOLOv11 builds on the strengths of earlier YOLO models by introducing a more efficient architecture that better combines features at different scales, improves object localization accuracy, and enhances performance in complex urban scenes. Using the DFC2023 Track 2 dataset -- which includes over 125,000 annotated buildings across 12 cities -- we evaluate YOLOv11's performance using metrics such as precision, recall, F1 score, and mean average precision (mAP). Our findings demonstrate that YOLOv11 achieves strong instance segmentation performance with 60.4\% mAP@50 and 38.3\% mAP@50--95 while maintaining robust classification accuracy across five predefined height tiers. The model excels in handling occlusions, complex building shapes, and class imbalance, particularly for rare high-rise structures. Comparative analysis confirms that YOLOv11 outperforms earlier multitask frameworks in both detection accuracy and inference speed, making it well-suited for real-time, large-scale urban mapping. This research highlights YOLOv11's potential to advance semantic urban reconstruction through streamlined categorical height modeling, offering actionable insights for future developments in remote sensing and geospatial intelligence.

Related papers

Detect Anything via Next Point Prediction [51.55967987350882]
Rex- Omni is a 3B-scale MLLM that achieves state-of-the-art object perception performance.<n>On benchmarks like COCO and LVIS, Rex- Omni attains performance comparable to or exceeding regression-based models.
arXiv Detail & Related papers (2025-10-14T17:59:54Z)
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models [73.19077622773075]
We present a comprehensive methodology for building spatial intelligence progressively.<n>We introduce SpatialLadder-26k, a multimodal dataset containing 26,610 samples spanning object localization, single image, multi-view, and video spatial reasoning tasks.<n>We design a three-stage progressive training framework that establishes spatial perception through object localization, develops spatial understanding through multi-dimensional spatial tasks, and strengthens complex reasoning via reinforcement learning with verifiable rewards.
arXiv Detail & Related papers (2025-10-09T17:50:54Z)
YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception [58.06752127687312]
We propose YOLOv13, an accurate and lightweight object detector.<n>We propose a Hypergraph-based Adaptive Correlation Enhancement (HyperACE) mechanism.<n>We also propose a Full-Pipeline Aggregation-and-Distribution (FullPAD) paradigm.
arXiv Detail & Related papers (2025-06-21T15:15:03Z)
Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation [67.23953699167274]
Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO)<n>In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery.<n>We propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance.
arXiv Detail & Related papers (2025-04-09T15:13:26Z)
YOLOv12: A Breakdown of the Key Architectural Features [0.5639904484784127]
YOLOv12 is a significant advancement in single-stage, real-time object detection.<n>It incorporates an optimised backbone (R-ELAN), 7x7 separable convolutions, and FlashAttention-driven area-based attention.<n>It offers scalable solutions for both latency-sensitive and high-accuracy applications.
arXiv Detail & Related papers (2025-02-20T17:08:43Z)
YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions [0.0]
This study represents the first comprehensive experimental evaluation of YOLOv3 to the latest version, YOLOv12.<n>The challenges considered include varying object sizes, diverse aspect ratios, and small-sized objects of a single class.<n>Our analysis highlights the distinctive strengths and limitations of each YOLO version.
arXiv Detail & Related papers (2024-10-31T20:45:00Z)
YOLOv11: An Overview of the Key Architectural Enhancements [0.5639904484784127]
The paper explores YOLOv11's expanded capabilities across various computer vision tasks, including object detection, instance segmentation, pose estimation, and oriented object detection (OBB) We review the model's performance improvements in terms of mean Average Precision (mAP) and computational efficiency compared to its predecessors, with a focus on the trade-off between parameter count and accuracy. Our research provides insights into YOLOv11's position within the broader landscape of object detection and its potential impact on real-time computer vision applications.
arXiv Detail & Related papers (2024-10-23T09:55:22Z)
YOLOv10: Real-Time End-to-End Object Detection [68.28699631793967]
YOLOs have emerged as the predominant paradigm in the field of real-time object detection. The reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs. We introduce the holistic efficiency-accuracy driven model design strategy for YOLOs.
arXiv Detail & Related papers (2024-05-23T11:44:29Z)
Semi-supervised Learning from Street-View Images and OpenStreetMap for Automatic Building Height Estimation [59.6553058160943]
We propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OpenStreetMap data. The proposed method leads to a clear performance boosting in estimating building heights with a Mean Absolute Error (MAE) around 2.1 meters. The preliminary result is promising and motivates our future work in scaling up the proposed method based on low-cost VGI data.
arXiv Detail & Related papers (2023-07-05T18:16:30Z)
A CNN regression model to estimate buildings height maps using Sentinel-1 SAR and Sentinel-2 MSI time series [0.0]
In this study, we propose a supervised Multimodal Building Height Network (MBHR-Net) for estimating building heights at 10m spatial resolution using Sentinel-1 (S1) and Sentinel-2 (S2) time series. Our MBHR-Net aims to extract meaningful features from the S1 and S2 images to learn complex-temporal relationships between image patterns and building heights. The model is trained and tested in 10 cities in the Netherlands Root Mean Squared Error (RMSE), Intersection over Union (IoU), and R-squared (R2) score metrics are used to evaluate the performance of the model.
arXiv Detail & Related papers (2023-07-03T22:16:17Z)
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection. YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation. YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.