Using Social Media Images for Building Function Classification
- URL: http://arxiv.org/abs/2202.07315v1
- Date: Tue, 15 Feb 2022 11:05:10 GMT
- Title: Using Social Media Images for Building Function Classification
- Authors: Eike Jens Hoffmann, Karam Abdulahhad, and Xiao Xiang Zhu
- Abstract summary: This study proposes a filtering pipeline to yield high quality, ground level imagery from large social media image datasets.
We analyze our method on a culturally diverse social media dataset from Flickr with more than 28 million images from 42 cities around the world.
Fine-tuned state-of-the-art architectures yield F1-scores of up to 0.51 on the filtered images.
- Score: 12.99941371793082
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Urban land use on a building instance level is crucial geo-information for
many applications, yet difficult to obtain. An intuitive approach to close this
gap is predicting building functions from ground level imagery. Social media
image platforms contain billions of images, with a large variety of motifs
including but not limited to street perspectives. To cope with this issue this
study proposes a filtering pipeline to yield high quality, ground level imagery
from large social media image datasets. The pipeline ensures that all resulting
images have full and valid geotags with a compass direction to relate image
content and spatial objects from maps.
We analyze our method on a culturally diverse social media dataset from
Flickr with more than 28 million images from 42 cities around the world. The
obtained dataset is then evaluated in a context of 3-classes building function
classification task. The three building classes that are considered in this
study are: commercial, residential, and other. Fine-tuned state-of-the-art
architectures yield F1-scores of up to 0.51 on the filtered images. Our
analysis shows that the performance is highly limited by the quality of the
labels obtained from OpenStreetMap, as the metrics increase by 0.2 if only
human validated labels are considered. Therefore, we consider these labels to
be weak and publish the resulting images from our pipeline together with the
buildings they are showing as a weakly labeled dataset.
Related papers
- GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - Towards Large-scale Building Attribute Mapping using Crowdsourced
Images: Scene Text Recognition on Flickr and Problems to be Solved [16.272425120319095]
This work addresses the challenges in applying Scene Text Recognition in crowdsourced street-view images for building attribute mapping.
A Berlin Flickr dataset is created, and pre-trained STR models are used for text detection and recognition.
We examined the correlation between STR results and building functions, and analysed instances where texts were recognized on residential buildings but not on commercial ones.
arXiv Detail & Related papers (2023-09-14T22:02:14Z) - Where We Are and What We're Looking At: Query Based Worldwide Image
Geo-localization Using Hierarchies and Scenes [53.53712888703834]
We introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels.
We achieve state of the art street level accuracy on 4 standard geo-localization datasets.
arXiv Detail & Related papers (2023-03-07T21:47:58Z) - Saliency Guided Contrastive Learning on Scene Images [71.07412958621052]
We leverage the saliency map derived from the model's output during learning to highlight discriminative regions and guide the whole contrastive learning.
Our method significantly improves the performance of self-supervised learning on scene images by +1.1, +4.3, +2.2 Top1 accuracy in ImageNet linear evaluation, Semi-supervised learning with 1% and 10% ImageNet labels, respectively.
arXiv Detail & Related papers (2023-02-22T15:54:07Z) - Which country is this picture from? New data and methods for DNN-based
country recognition [33.73817899937691]
Previous works have focused mostly on the estimation of the geo-coordinates where a picture has been taken.
We introduce a new dataset, the VIPPGeo dataset, containing almost 4 million images.
We use the dataset to train a deep learning architecture casting the country recognition problem as a classification problem.
arXiv Detail & Related papers (2022-09-02T10:56:41Z) - There is a Time and Place for Reasoning Beyond the Image [63.96498435923328]
Images often more significant than only the pixels to human eyes, as we can infer, associate, and reason with contextual information from other sources to establish a more complete picture.
We introduce TARA: a dataset with 16k images with their associated news, time and location automatically extracted from New York Times (NYT), and an additional 61k examples as distant supervision from WIT.
We show that there exists a 70% gap between a state-of-the-art joint model and human performance, which is slightly filled by our proposed model that uses segment-wise reasoning, motivating higher-level vision-language joint models that
arXiv Detail & Related papers (2022-03-01T21:52:08Z) - Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place
Recognition and Localization [9.834635805575584]
We contribute with the emphDanish Airs and Grounds dataset, a large collection of street-level and aerial images targeting such cases.
The dataset is larger and more diverse than current publicly available data, including more than 50 km of road in urban, suburban and rural areas.
We propose a map-to-image re-localization pipeline, that first estimates a dense 3D reconstruction from the aerial images and then matches query street-level images to street-level renderings of the 3D model.
arXiv Detail & Related papers (2022-02-03T19:58:09Z) - SensatUrban: Learning Semantics from Urban-Scale Photogrammetric Point
Clouds [52.624157840253204]
We introduce SensatUrban, an urban-scale UAV photogrammetry point cloud dataset consisting of nearly three billion points collected from three UK cities, covering 7.6 km2.
Each point in the dataset has been labelled with fine-grained semantic annotations, resulting in a dataset that is three times the size of the previous existing largest photogrammetric point cloud dataset.
arXiv Detail & Related papers (2022-01-12T14:48:11Z) - Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset,
Benchmarks and Challenges [52.624157840253204]
We present an urban-scale photogrammetric point cloud dataset with nearly three billion richly annotated points.
Our dataset consists of large areas from three UK cities, covering about 7.6 km2 of the city landscape.
We evaluate the performance of state-of-the-art algorithms on our dataset and provide a comprehensive analysis of the results.
arXiv Detail & Related papers (2020-09-07T14:47:07Z) - Google Landmarks Dataset v2 -- A Large-Scale Benchmark for
Instance-Level Recognition and Retrieval [9.922132565411664]
We introduce the Google Landmarks dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval.
GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels.
The dataset is sourced from Wikimedia Commons, the world's largest crowdsourced collection of landmark photos.
arXiv Detail & Related papers (2020-04-03T22:52:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.