Explainability of Deep Learning models for Urban Space perception
- URL: http://arxiv.org/abs/2208.13555v1
- Date: Mon, 29 Aug 2022 12:44:48 GMT
- Title: Explainability of Deep Learning models for Urban Space perception
- Authors: Ruben Sangers, Jan van Gemert, Sander van Cranenburgh
- Abstract summary: This study investigates how computer vision models can be used to extract relevant policy information about peoples' perception of the urban space.
We train two widely used computer vision architectures; a Convolutional Neural Network and a transformer, and apply GradCAM -- a well-known ex-post explainable AI technique -- to highlight the image regions important for the model's prediction.
- Score: 9.422663267011913
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Deep learning based computer vision models are increasingly used by urban
planners to support decision making for shaping urban environments. Such models
predict how people perceive the urban environment quality in terms of e.g. its
safety or beauty. However, the blackbox nature of deep learning models hampers
urban planners to understand what landscape objects contribute to a
particularly high quality or low quality urban space perception. This study
investigates how computer vision models can be used to extract relevant policy
information about peoples' perception of the urban space. To do so, we train
two widely used computer vision architectures; a Convolutional Neural Network
and a transformer, and apply GradCAM -- a well-known ex-post explainable AI
technique -- to highlight the image regions important for the model's
prediction. Using these GradCAM visualizations, we manually annotate the
objects relevant to the models' perception predictions. As a result, we are
able to discover new objects that are not represented in present object
detection models used for annotation in previous studies. Moreover, our
methodological results suggest that transformer architectures are better suited
to be used in combination with GradCAM techniques. Code is available on Github.
Related papers
- A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships [0.5639904484784127]
Transformer-based models have transformed the landscape of natural language processing (NLP)
These models are renowned for their ability to capture long-range dependencies and contextual information.
We discuss potential research directions and applications of transformer-based models in computer vision.
arXiv Detail & Related papers (2024-08-27T16:22:18Z) - Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms [91.19304518033144]
We aim to align vision models with human aesthetic standards in a retrieval system.
We propose a preference-based reinforcement learning method that fine-tunes the vision models to better align the vision models with human aesthetics.
arXiv Detail & Related papers (2024-06-13T17:59:20Z) - TK-Planes: Tiered K-Planes with High Dimensional Feature Vectors for Dynamic UAV-based Scenes [58.180556221044235]
We present a new approach to bridge the domain gap between synthetic and real-world data for unmanned aerial vehicle (UAV)-based perception.
Our formulation is designed for dynamic scenes, consisting of small moving objects or human actions.
We evaluate its performance on challenging datasets, including Okutama Action and UG2.
arXiv Detail & Related papers (2024-05-04T21:55:33Z) - SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients [0.8873228457453465]
Small object detection in aerial imagery presents significant challenges in computer vision.
Traditional methods using transformer-based models often face limitations stemming from the lack of specialized databases.
This paper introduces two innovative approaches that significantly enhance detection and segmentation capabilities for small aerial objects.
arXiv Detail & Related papers (2024-05-02T19:47:08Z) - UrbanGenAI: Reconstructing Urban Landscapes using Panoptic Segmentation
and Diffusion Models [0.0]
This paper presents a novel workflow encapsulated within a prototype application, designed to leverage the synergies between advanced image segmentation and diffusion models for a comprehensive approach to urban design.
validation results indicated a high degree of performance by the prototype application, showcasing significant accuracy in both object detection and text-to-image generation.
Preliminary testing included utilising UrbanGenAI as an educational tool enhancing the learning experience in design pedagogy, and as a participatory instrument facilitating community-driven urban planning.
arXiv Detail & Related papers (2024-01-25T18:30:46Z) - Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - Neural architecture impact on identifying temporally extended
Reinforcement Learning tasks [0.0]
We present Attention based architectures in reinforcement learning (RL) domain, capable of performing well on OpenAI Gym Atari- 2600 game suite.
In Attention based models, extracting and overlaying of attention map onto images allows for direct observation of information used by agent to select actions.
In addition, motivated by recent developments in attention based video-classification models using Vision Transformer, we come up with an architecture based on Vision Transformer, for image-based RL domain too.
arXiv Detail & Related papers (2023-10-04T21:09:19Z) - Learning Multi-Object Dynamics with Compositional Neural Radiance Fields [63.424469458529906]
We present a method to learn compositional predictive models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks.
NeRFs have become a popular choice for representing scenes due to their strong 3D prior.
For planning, we utilize RRTs in the learned latent space, where we can exploit our model and the implicit object encoder to make sampling the latent space informative and more efficient.
arXiv Detail & Related papers (2022-02-24T01:31:29Z) - Deep Learning for Spatiotemporal Modeling of Urbanization [21.677957140614556]
Urbanization has a strong impact on the health and wellbeing of populations across the world.
Many spatial models have been developed using machine learning and numerical modeling techniques.
Here we explore the capacity of deep spatial learning for the predictive modeling of urbanization.
arXiv Detail & Related papers (2021-12-17T18:27:52Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - Learning Predictive Representations for Deformable Objects Using
Contrastive Estimation [83.16948429592621]
We propose a new learning framework that jointly optimize both the visual representation model and the dynamics model.
We show substantial improvements over standard model-based learning techniques across our rope and cloth manipulation suite.
arXiv Detail & Related papers (2020-03-11T17:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.