PanorAMS: Automatic Annotation for Detecting Objects in Urban Context
- URL: http://arxiv.org/abs/2208.14295v2
- Date: Wed, 31 Aug 2022 09:59:15 GMT
- Title: PanorAMS: Automatic Annotation for Detecting Objects in Urban Context
- Authors: Inske Groenen, Stevan Rudinac and Marcel Worring
- Abstract summary: We introduce a method to automatically generate bounding box annotations for panoramic images based on urban context information.
We acquire large-scale, albeit noisy, annotations for an urban dataset solely from open data sources in a fast and automatic manner.
For detailed evaluation, we introduce an efficient crowdsourcing protocol for bounding box annotations in panoramic images.
- Score: 17.340826322549596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large collections of geo-referenced panoramic images are freely available for
cities across the globe, as well as detailed maps with location and meta-data
on a great variety of urban objects. They provide a potentially rich source of
information on urban objects, but manual annotation for object detection is
costly, laborious and difficult. Can we utilize such multimedia sources to
automatically annotate street level images as an inexpensive alternative to
manual labeling? With the PanorAMS framework we introduce a method to
automatically generate bounding box annotations for panoramic images based on
urban context information. Following this method, we acquire large-scale,
albeit noisy, annotations for an urban dataset solely from open data sources in
a fast and automatic manner. The dataset covers the City of Amsterdam and
includes over 14 million noisy bounding box annotations of 22 object categories
present in 771,299 panoramic images. For many objects further fine-grained
information is available, obtained from geospatial meta-data, such as building
value, function and average surface area. Such information would have been
difficult, if not impossible, to acquire via manual labeling based on the image
alone. For detailed evaluation, we introduce an efficient crowdsourcing
protocol for bounding box annotations in panoramic images, which we deploy to
acquire 147,075 ground-truth object annotations for a subset of 7,348 images,
the PanorAMS-clean dataset. For our PanorAMS-noisy dataset, we provide an
extensive analysis of the noise and how different types of noise affect image
classification and object detection performance. We make both datasets,
PanorAMS-noisy and PanorAMS-clean, benchmarks and tools presented in this paper
openly available.
Related papers
- 360 in the Wild: Dataset for Depth Prediction and View Synthesis [66.58513725342125]
We introduce a large scale 360$circ$ videos dataset in the wild.
This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide.
Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map.
arXiv Detail & Related papers (2024-06-27T05:26:38Z) - OpenIllumination: A Multi-Illumination Dataset for Inverse Rendering
Evaluation on Real Objects [56.065616159398324]
We introduce OpenIllumination, a real-world dataset containing over 108K images of 64 objects with diverse materials.
For each image in the dataset, we provide accurate camera parameters, illumination ground truth, and foreground segmentation masks.
arXiv Detail & Related papers (2023-09-14T17:59:53Z) - OmniCity: Omnipotent City Understanding with Multi-level and Multi-view
Images [72.4144257192959]
The paper presents OmniCity, a new dataset for omnipotent city understanding from multi-level and multi-view images.
The dataset contains over 100K pixel-wise annotated images that are well-aligned and collected from 25K geo-locations in New York City.
With the new OmniCity dataset, we provide benchmarks for a variety of tasks including building footprint extraction, height estimation, and building plane/instance/fine-grained segmentation.
arXiv Detail & Related papers (2022-08-01T15:19:25Z) - Mapping Temporary Slums from Satellite Imagery using a Semi-Supervised
Approach [5.830619388189557]
One billion people worldwide are estimated to be living in slums.
Small, scattered and temporary slums make data collection and labeling tedious and time-consuming.
We present a semi-supervised deep learning segmentation-based approach to detect temporary slums.
arXiv Detail & Related papers (2022-04-09T08:02:32Z) - There is a Time and Place for Reasoning Beyond the Image [63.96498435923328]
Images often more significant than only the pixels to human eyes, as we can infer, associate, and reason with contextual information from other sources to establish a more complete picture.
We introduce TARA: a dataset with 16k images with their associated news, time and location automatically extracted from New York Times (NYT), and an additional 61k examples as distant supervision from WIT.
We show that there exists a 70% gap between a state-of-the-art joint model and human performance, which is slightly filled by our proposed model that uses segment-wise reasoning, motivating higher-level vision-language joint models that
arXiv Detail & Related papers (2022-03-01T21:52:08Z) - Using Social Media Images for Building Function Classification [12.99941371793082]
This study proposes a filtering pipeline to yield high quality, ground level imagery from large social media image datasets.
We analyze our method on a culturally diverse social media dataset from Flickr with more than 28 million images from 42 cities around the world.
Fine-tuned state-of-the-art architectures yield F1-scores of up to 0.51 on the filtered images.
arXiv Detail & Related papers (2022-02-15T11:05:10Z) - Semantic Segmentation on Swiss3DCities: A Benchmark Study on Aerial
Photogrammetric 3D Pointcloud Dataset [67.44497676652173]
We introduce a new outdoor urban 3D pointcloud dataset, covering a total area of 2.7 $km2$, sampled from three Swiss cities.
The dataset is manually annotated for semantic segmentation with per-point labels, and is built using photogrammetry from images acquired by multirotors equipped with high-resolution cameras.
arXiv Detail & Related papers (2020-12-23T21:48:47Z) - Data-driven Meta-set Based Fine-Grained Visual Classification [61.083706396575295]
We propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition.
Specifically, guided by a small amount of clean meta-set, we train a selection net in a meta-learning manner to distinguish in- and out-of-distribution noisy images.
arXiv Detail & Related papers (2020-08-06T03:04:16Z) - Predicting Semantic Map Representations from Images using Pyramid
Occupancy Networks [27.86228863466213]
We present a simple, unified approach for estimating maps directly from monocular images using a single end-to-end deep learning architecture.
We demonstrate the effectiveness of our approach by evaluating against several challenging baselines on the NuScenes and Argoverse datasets.
arXiv Detail & Related papers (2020-03-30T12:39:44Z) - Automatic Signboard Detection and Localization in Densely Populated
Developing Cities [0.0]
Signboard detection in natural scene images is the foremost task for error-free information retrieval.
We present a novel object detection approach that can detect signboards automatically and is suitable for such cities.
Our proposed method can detect signboards accurately (even if the images contain multiple signboards with diverse shapes and colours in a noisy background) achieving 0.90 mAP (mean average precision)
arXiv Detail & Related papers (2020-03-04T08:04:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.