Holistic Multi-View Building Analysis in the Wild with Projection
Pooling
- URL: http://arxiv.org/abs/2008.10041v3
- Date: Sat, 19 Dec 2020 20:59:07 GMT
- Title: Holistic Multi-View Building Analysis in the Wild with Projection
Pooling
- Authors: Zbigniew Wojna, Krzysztof Maziarz, {\L}ukasz Jocz, Robert Pa{\l}uba,
Robert Kozikowski, Iasonas Kokkinos
- Abstract summary: We address six different classification tasks related to fine-grained building attributes.
Tackling such a remote building analysis problem became possible only recently due to growing large-scale datasets of urban scenes.
We introduce a new benchmarking dataset, consisting of 49426 images (top-view and street-view) of 9674 buildings.
- Score: 18.93067906200084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address six different classification tasks related to fine-grained
building attributes: construction type, number of floors, pitch and geometry of
the roof, facade material, and occupancy class. Tackling such a remote building
analysis problem became possible only recently due to growing large-scale
datasets of urban scenes. To this end, we introduce a new benchmarking dataset,
consisting of 49426 images (top-view and street-view) of 9674 buildings. These
photos are further assembled, together with the geometric metadata. The dataset
showcases various real-world challenges, such as occlusions, blur, partially
visible objects, and a broad spectrum of buildings. We propose a new projection
pooling layer, creating a unified, top-view representation of the top-view and
the side views in a high-dimensional space. It allows us to utilize the
building and imagery metadata seamlessly. Introducing this layer improves
classification accuracy -- compared to highly tuned baseline models --
indicating its suitability for building analysis.
Related papers
- GBSS:a global building semantic segmentation dataset for large-scale
remote sensing building extraction [10.39943244036649]
We construct a Global Building Semantic dataset (The dataset will be released), which comprises 116.9k pairs of samples (about 742k buildings) from six continents.
There are significant variations of building samples in terms of size and style, so the dataset can be a more challenging benchmark for evaluating the generalization and robustness of building semantic segmentation models.
arXiv Detail & Related papers (2024-01-02T12:13:35Z) - Building Height Prediction with Instance Segmentation [0.0]
We present an instance segmentation-based building height extraction method to predict building masks from a single RGB satellite image.
We used satellite images with building height annotations of certain cities along with an open-source satellite dataset with the transfer learning approach.
We reached, the bounding box mAP 59, the mask mAP 52.6, and the average accuracy value of 70% for buildings belonging to each height class in our test set.
arXiv Detail & Related papers (2022-12-19T07:12:49Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - OmniCity: Omnipotent City Understanding with Multi-level and Multi-view
Images [72.4144257192959]
The paper presents OmniCity, a new dataset for omnipotent city understanding from multi-level and multi-view images.
The dataset contains over 100K pixel-wise annotated images that are well-aligned and collected from 25K geo-locations in New York City.
With the new OmniCity dataset, we provide benchmarks for a variety of tasks including building footprint extraction, height estimation, and building plane/instance/fine-grained segmentation.
arXiv Detail & Related papers (2022-08-01T15:19:25Z) - TMBuD: A dataset for urban scene building detection [0.0]
This paper introduces a dataset solution, the TMBuD, that is better fitted for image processing on human made structures for urban scene scenarios.
The proposed dataset will allow proper evaluation of salient edges and semantic segmentation of images focusing on the street view perspective of buildings.
The dataset features 160 images of buildings from Timisoara, Romania, with a resolution of 768 x 1024 pixels each.
arXiv Detail & Related papers (2021-10-27T17:08:11Z) - BuildingNet: Learning to Label 3D Buildings [19.641000866952815]
BuildingNet: (a) large-scale 3D building models whose exteriors consistently labeled, (b) a neural network that labels building analyzing and structural relations of their geometric primitives.
The dataset covers categories, such as houses, churches, skyscrapers, town halls and castles.
arXiv Detail & Related papers (2021-10-11T01:45:26Z) - FloorLevel-Net: Recognizing Floor-Level Lines with
Height-Attention-Guided Multi-task Learning [49.30194762653723]
This work tackles the problem of locating floor-level lines in street-view images, using a supervised deep learning approach.
We first compile a new dataset and develop a new data augmentation scheme to synthesize training samples.
Next, we design FloorLevel-Net, a multi-task learning network that associates explicit features of building facades and implicit floor-level lines.
arXiv Detail & Related papers (2021-07-06T08:17:59Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Counting from Sky: A Large-scale Dataset for Remote Sensing Object
Counting and A Benchmark Method [52.182698295053264]
We are interested in counting dense objects from remote sensing images. Compared with object counting in a natural scene, this task is challenging in the following factors: large scale variation, complex cluttered background, and orientation arbitrariness.
To address these issues, we first construct a large-scale object counting dataset with remote sensing images, which contains four important geographic objects.
We then benchmark the dataset by designing a novel neural network that can generate a density map of an input image.
arXiv Detail & Related papers (2020-08-28T03:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.