Placepedia: Comprehensive Place Understanding with Multi-Faceted
Annotations
- URL: http://arxiv.org/abs/2007.03777v4
- Date: Fri, 17 Jul 2020 08:56:05 GMT
- Title: Placepedia: Comprehensive Place Understanding with Multi-Faceted
Annotations
- Authors: Huaiyi Huang, Yuqi Zhang, Qingqiu Huang, Zhengkui Guo, Ziwei Liu, and
Dahua Lin
- Abstract summary: We contribute Placepedia, a large-scale place dataset with more than 35M photos from 240K unique places.
Besides the photos, each place also comes with massive multi-faceted information, e.g. GDP, population, etc.
This dataset, with its large amount of data and rich annotations, allows various studies to be conducted.
- Score: 79.80036503792985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Place is an important element in visual understanding. Given a photo of a
building, people can often tell its functionality, e.g. a restaurant or a shop,
its cultural style, e.g. Asian or European, as well as its economic type, e.g.
industry oriented or tourism oriented. While place recognition has been widely
studied in previous work, there remains a long way towards comprehensive place
understanding, which is far beyond categorizing a place with an image and
requires information of multiple aspects. In this work, we contribute
Placepedia, a large-scale place dataset with more than 35M photos from 240K
unique places. Besides the photos, each place also comes with massive
multi-faceted information, e.g. GDP, population, etc., and labels at multiple
levels, including function, city, country, etc.. This dataset, with its large
amount of data and rich annotations, allows various studies to be conducted.
Particularly, in our studies, we develop 1) PlaceNet, a unified framework for
multi-level place recognition, and 2) a method for city embedding, which can
produce a vector representation for a city that captures both visual and
multi-faceted side information. Such studies not only reveal key challenges in
place understanding, but also establish connections between visual observations
and underlying socioeconomic/cultural implications.
Related papers
- WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines [74.25764182510295]
Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English.
We introduce World Cuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding.
This benchmark includes a visual question answering (VQA) dataset with text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points.
arXiv Detail & Related papers (2024-10-16T16:11:49Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - Where We Are and What We're Looking At: Query Based Worldwide Image
Geo-localization Using Hierarchies and Scenes [53.53712888703834]
We introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels.
We achieve state of the art street level accuracy on 4 standard geo-localization datasets.
arXiv Detail & Related papers (2023-03-07T21:47:58Z) - Building Floorspace in China: A Dataset and Learning Pipeline [0.32228025627337864]
This paper provides a first milestone in measuring the floorspace of buildings in 40 major Chinese cities.
We use Sentinel-1 and -2 satellite images as our main data source.
We provide a detailed description of our data, algorithms, and evaluations.
arXiv Detail & Related papers (2023-03-03T21:45:36Z) - There is a Time and Place for Reasoning Beyond the Image [63.96498435923328]
Images often more significant than only the pixels to human eyes, as we can infer, associate, and reason with contextual information from other sources to establish a more complete picture.
We introduce TARA: a dataset with 16k images with their associated news, time and location automatically extracted from New York Times (NYT), and an additional 61k examples as distant supervision from WIT.
We show that there exists a 70% gap between a state-of-the-art joint model and human performance, which is slightly filled by our proposed model that uses segment-wise reasoning, motivating higher-level vision-language joint models that
arXiv Detail & Related papers (2022-03-01T21:52:08Z) - Deep-learning coupled with novel classification method to classify the
urban environment of the developing world [4.819654695540227]
We propose a novel classification method that is readily usable for machine analysis and show applicability of the methodology on a developing world setting.
We categorize the urban area in terms of informal and formal spaces taking the surroundings into account.
The model is able to segment with 75% accuracy and 60% Mean IoU.
arXiv Detail & Related papers (2020-11-25T16:08:07Z) - City-Scale Visual Place Recognition with Deep Local Features Based on
Multi-Scale Ordered VLAD Pooling [5.274399407597545]
We present a fully-automated system for place recognition at a city-scale based on content-based image retrieval.
Firstly, we take a comprehensive analysis of visual place recognition and sketch out the unique challenges of the task.
Next, we propose yet a simple pooling approach on top of convolutional neural network activations to embed the spatial information into the image representation vector.
arXiv Detail & Related papers (2020-09-19T15:21:59Z) - Location Sensitive Image Retrieval and Tagging [10.832389603397603]
LocSens is a model that learns to rank triplets of images, tags and coordinates by plausibility.
We present LocSens, a model that learns to rank triplets of images, tags and coordinates by plausibility, and two training strategies to balance the location influence in the final ranking.
arXiv Detail & Related papers (2020-07-07T12:09:01Z) - A Survey on Knowledge Graphs: Representation, Acquisition and
Applications [89.78089494738002]
We review research topics about 1) knowledge graph representation learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph, and 4) knowledge-aware applications.
For knowledge acquisition, especially knowledge graph completion, embedding methods, path inference, and logical rule reasoning, are reviewed.
We explore several emerging topics, including meta learning, commonsense reasoning, and temporal knowledge graphs.
arXiv Detail & Related papers (2020-02-02T13:17:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.