EMPLACE: Self-Supervised Urban Scene Change Detection
- URL: http://arxiv.org/abs/2503.17716v1
- Date: Sat, 22 Mar 2025 10:20:43 GMT
- Title: EMPLACE: Self-Supervised Urban Scene Change Detection
- Authors: Tim Alpherts, Sennay Ghebreab, Nanne van Noord,
- Abstract summary: Urban Scene Change Detection (USCD) aims to capture changes in street scenes using computer vision.<n>We introduce AC-1M the largest USCD dataset by far of over 1.1M images, together with EMPLACE, a self-supervising method to train a Vision Transformer.<n>In a case study of Amsterdam, we show that we are able to detect both small and large changes throughout the city and that changes uncovered by EMPLACE, depending on size, correlate with housing prices.
- Score: 6.250018240133604
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Urban change is a constant process that influences the perception of neighbourhoods and the lives of the people within them. The field of Urban Scene Change Detection (USCD) aims to capture changes in street scenes using computer vision and can help raise awareness of changes that make it possible to better understand the city and its residents. Traditionally, the field of USCD has used supervised methods with small scale datasets. This constrains methods when applied to new cities, as it requires labour-intensive labeling processes and forces a priori definitions of relevant change. In this paper we introduce AC-1M the largest USCD dataset by far of over 1.1M images, together with EMPLACE, a self-supervising method to train a Vision Transformer using our adaptive triplet loss. We show EMPLACE outperforms SOTA methods both as a pre-training method for linear fine-tuning as well as a zero-shot setting. Lastly, in a case study of Amsterdam, we show that we are able to detect both small and large changes throughout the city and that changes uncovered by EMPLACE, depending on size, correlate with housing prices - which in turn is indicative of inequity.
Related papers
- CityPulse: Fine-Grained Assessment of Urban Change with Street View Time
Series [12.621355888239359]
Urban transformations have profound societal impact on both individuals and communities at large.
We propose an end-to-end change detection model to effectively capture physical alterations in the built environment at scale.
Our approach has the potential to supplement existing dataset and serve as a fine-grained and accurate assessment of urban change.
arXiv Detail & Related papers (2024-01-02T08:57:09Z) - Self-supervised learning unveils change in urban housing from
street-level images [2.0971479389679337]
Street2Vec embeds urban structure while being invariant to seasonal and daily changes without manual annotations.
It identified point-level change in London's housing supply from street-level images, and distinguished between major and minor change.
This capability can provide timely information for urban planning and policy decisions toward more liveable, equitable, and sustainable cities.
arXiv Detail & Related papers (2023-09-20T14:35:23Z) - CityTrack: Improving City-Scale Multi-Camera Multi-Target Tracking by
Location-Aware Tracking and Box-Grained Matching [15.854610268846562]
Multi-Camera Multi-Target Tracking (MCMT) is a computer vision technique that involves tracking multiple targets simultaneously across multiple cameras.
We propose a systematic MCMT framework, called CityTrack, to overcome the challenges posed by urban traffic scenes.
We present a Location-Aware SCMT tracker which integrates various advanced techniques to improve its effectiveness in the MCMT task.
arXiv Detail & Related papers (2023-07-06T03:25:37Z) - Vicinity Vision Transformer [53.43198716947792]
We present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity.
Our approach achieves state-of-the-art image classification accuracy with 50% fewer parameters than previous methods.
arXiv Detail & Related papers (2022-06-21T17:33:53Z) - Congested Crowd Instance Localization with Dilated Convolutional Swin
Transformer [119.72951028190586]
Crowd localization is a new computer vision task, evolved from crowd counting.
In this paper, we focus on how to achieve precise instance localization in high-density crowd scenes.
We propose a Dilated Convolutional Swin Transformer (DCST) for congested crowd scenes.
arXiv Detail & Related papers (2021-08-02T01:27:53Z) - Safety-Oriented Pedestrian Motion and Scene Occupancy Forecasting [91.69900691029908]
We advocate for predicting both the individual motions as well as the scene occupancy map.
We propose a Scene-Actor Graph Neural Network (SA-GNN) which preserves the relative spatial information of pedestrians.
On two large-scale real-world datasets, we showcase that our scene-occupancy predictions are more accurate and better calibrated than those from state-of-the-art motion forecasting methods.
arXiv Detail & Related papers (2021-01-07T06:08:21Z) - Hi-UCD: A Large-scale Dataset for Urban Semantic Change Detection in
Remote Sensing Imagery [5.151973524974052]
Hi-UCD is a large scale benchmark dataset for urban change detection.
It can be used for detecting and analyzing refined urban changes.
We benchmark our dataset using some classic methods in binary and multi-class change detection.
arXiv Detail & Related papers (2020-11-06T09:20:54Z) - Toward Accurate Person-level Action Recognition in Videos of Crowded
Scenes [131.9067467127761]
We focus on improving the action recognition by fully-utilizing the information of scenes and collecting new data.
Specifically, we adopt a strong human detector to detect spatial location of each frame.
We then apply action recognition models to learn thetemporal information from video frames on both the HIE dataset and new data with diverse scenes from the internet.
arXiv Detail & Related papers (2020-10-16T13:08:50Z) - A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in
Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones.
To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed.
To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z) - BoMuDANet: Unsupervised Adaptation for Visual Scene Understanding in
Unstructured Driving Environments [54.22535063244038]
We present an unsupervised adaptation approach for visual scene understanding in unstructured traffic environments.
Our method is designed for unstructured real-world scenarios with dense and heterogeneous traffic consisting of cars, trucks, two-and three-wheelers, and pedestrians.
arXiv Detail & Related papers (2020-09-22T08:25:44Z) - Unsupervised Vehicle Counting via Multiple Camera Domain Adaptation [9.730985797769764]
Monitoring vehicle flows in cities is crucial to improve the urban environment and quality of life of citizens.
Current technologies for vehicle counting in images hinge on large quantities of annotated data, preventing their scalability to city-scale as new cameras are added to the system.
We propose and discuss a new methodology to design image-based vehicle density estimators with few labeled data via multiple camera domain adaptations.
arXiv Detail & Related papers (2020-04-20T13:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.