A natural language processing-based approach: mapping human perception
by understanding deep semantic features in street view images
- URL: http://arxiv.org/abs/2311.17354v1
- Date: Wed, 29 Nov 2023 05:00:43 GMT
- Title: A natural language processing-based approach: mapping human perception
by understanding deep semantic features in street view images
- Authors: Haoran Ma and Dongdong Wu
- Abstract summary: We propose a new framework based on a pre-train natural language model to understand the relationship between human perception and a scene.
Our results show that human perception scoring by deep semantic features performed better than previous studies by machine learning methods with shallow features.
- Score: 2.5880672192855414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the past decade, using Street View images and machine learning to measure
human perception has become a mainstream research approach in urban science.
However, this approach using only image-shallow information makes it difficult
to comprehensively understand the deep semantic features of human perception of
a scene. In this study, we proposed a new framework based on a pre-train
natural language model to understand the relationship between human perception
and the sense of a scene. Firstly, Place Pulse 2.0 was used as our base
dataset, which contains a variety of human-perceived labels, namely, beautiful,
safe, wealthy, depressing, boring, and lively. An image captioning network was
used to extract the description information of each street view image.
Secondly, a pre-trained BERT model was finetuning and added a regression
function for six human perceptual dimensions. Furthermore, we compared the
performance of five traditional regression methods with our approach and
conducted a migration experiment in Hong Kong. Our results show that human
perception scoring by deep semantic features performed better than previous
studies by machine learning methods with shallow features. The use of deep
scene semantic features provides new ideas for subsequent human perception
research, as well as better explanatory power in the face of spatial
heterogeneity.
Related papers
- Semantic-Human: Neural Rendering of Humans from Monocular Video with
Human Parsing [14.264835399504376]
We present Semantic-Human, a novel method that achieves photorealistic details and viewpoint-consistent human parsing for the neural rendering of humans.
Specifically, we extend neural radiance fields (NeRF) to jointly encode semantics, appearance and geometry to achieve accurate 2D semantic labels.
We also showcase various compelling applications, including label denoising, label synthesis and image editing.
arXiv Detail & Related papers (2023-08-19T03:18:19Z) - Find Someone Who: Visual Commonsense Understanding in Human-Centric
Grounding [87.39245901710079]
We present a new commonsense task, Human-centric Commonsense Grounding.
It tests the models' ability to ground individuals given the context descriptions about what happened before.
We set up a context-object-aware method as a strong baseline that outperforms previous pre-trained and non-pretrained models.
arXiv Detail & Related papers (2022-12-14T01:37:16Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Neural Novel Actor: Learning a Generalized Animatable Neural
Representation for Human Actors [98.24047528960406]
We propose a new method for learning a generalized animatable neural representation from a sparse set of multi-view imagery of multiple persons.
The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control.
arXiv Detail & Related papers (2022-08-25T07:36:46Z) - Enhancing Social Relation Inference with Concise Interaction Graph and
Discriminative Scene Representation [56.25878966006678]
We propose an approach of textbfPRactical textbfInference in textbfSocial rtextbfElation (PRISE)
It concisely learns interactive features of persons and discriminative features of holistic scenes.
PRISE achieves 6.8$%$ improvement for domain classification in PIPA dataset.
arXiv Detail & Related papers (2021-07-30T04:20:13Z) - Animatable Neural Radiance Fields from Monocular RGB Video [72.6101766407013]
We present animatable neural radiance fields for detailed human avatar creation from monocular videos.
Our approach extends neural radiance fields to the dynamic scenes with human movements via introducing explicit pose-guided deformation.
In experiments we show that the proposed approach achieves 1) implicit human geometry and appearance reconstruction with high-quality details, 2) photo-realistic rendering of the human from arbitrary views, and 3) animation of the human with arbitrary poses.
arXiv Detail & Related papers (2021-06-25T13:32:23Z) - Learning High Fidelity Depths of Dressed Humans by Watching Social Media
Dance Videos [21.11427729302936]
We present a new method to use the local transformation that warps the predicted local geometry of the person from an image to that of another image at a different time instant.
Our method is end-to-end trainable, resulting in high fidelity depth estimation that predicts fine geometry faithful to the input real image.
arXiv Detail & Related papers (2021-03-04T20:46:30Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.