Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for
Cross-City Semantic Segmentation using High-Resolution Domain Adaptation
Networks
- URL: http://arxiv.org/abs/2309.16499v2
- Date: Tue, 3 Oct 2023 08:49:58 GMT
- Title: Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for
Cross-City Semantic Segmentation using High-Resolution Domain Adaptation
Networks
- Authors: Danfeng Hong, Bing Zhang, Hao Li, Yuxuan Li, Jing Yao, Chenyu Li,
Martin Werner, Jocelyn Chanussot, Alexander Zipf, Xiao Xiang Zhu
- Abstract summary: We build a new set of multimodal remote sensing benchmark datasets (including hyperspectral, multispectral, SAR) for the study purpose of the cross-city semantic segmentation task.
Beyond the single city, we propose a high-resolution domain adaptation network, HighDAN, to promote the AI model's generalization ability from the multi-city environments.
HighDAN is capable of retaining the spatially topological structure of the studied urban scene well in a parallel high-to-low resolution fusion fashion.
- Score: 82.82866901799565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial intelligence (AI) approaches nowadays have gained remarkable
success in single-modality-dominated remote sensing (RS) applications,
especially with an emphasis on individual urban environments (e.g., single
cities or regions). Yet these AI models tend to meet the performance bottleneck
in the case studies across cities or regions, due to the lack of diverse RS
information and cutting-edge solutions with high generalization ability. To
this end, we build a new set of multimodal remote sensing benchmark datasets
(including hyperspectral, multispectral, SAR) for the study purpose of the
cross-city semantic segmentation task (called C2Seg dataset), which consists of
two cross-city scenes, i.e., Berlin-Augsburg (in Germany) and Beijing-Wuhan (in
China). Beyond the single city, we propose a high-resolution domain adaptation
network, HighDAN for short, to promote the AI model's generalization ability
from the multi-city environments. HighDAN is capable of retaining the spatially
topological structure of the studied urban scene well in a parallel high-to-low
resolution fusion fashion but also closing the gap derived from enormous
differences of RS image representations between different cities by means of
adversarial learning. In addition, the Dice loss is considered in HighDAN to
alleviate the class imbalance issue caused by factors across cities. Extensive
experiments conducted on the C2Seg dataset show the superiority of our HighDAN
in terms of segmentation performance and generalization ability, compared to
state-of-the-art competitors. The C2Seg dataset and the semantic segmentation
toolbox (involving the proposed HighDAN) will be available publicly at
https://github.com/danfenghong.
Related papers
- Point Cloud Segmentation Using Transfer Learning with RandLA-Net: A Case
Study on Urban Areas [0.5242869847419834]
This paper presents the application of RandLA-Net, a state-of-the-art neural network architecture, for the 3D segmentation of large-scale point cloud data in urban areas.
The study focuses on three major Chinese cities, namely Chengdu, Jiaoda, and Shenzhen, leveraging their unique characteristics to enhance segmentation performance.
arXiv Detail & Related papers (2023-12-19T06:13:58Z) - ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and
Multispectral Data Fusion [54.668445421149364]
Deep learning-based hyperspectral image (HSI) super-resolution aims to generate high spatial resolution HSI (HR-HSI) by fusing hyperspectral image (HSI) and multispectral image (MSI) with deep neural networks (DNNs)
In this letter, we propose a novel adversarial automatic data augmentation framework ADASR that automatically optimize and augments HSI-MSI sample pairs to enrich data diversity for HSI-MSI fusion.
arXiv Detail & Related papers (2023-10-11T07:30:37Z) - General-Purpose Multimodal Transformer meets Remote Sensing Semantic
Segmentation [35.100738362291416]
Multimodal AI seeks to exploit complementary data sources, particularly for complex tasks like semantic segmentation.
Recent trends in general-purpose multimodal networks have shown great potential to achieve state-of-the-art performance.
We propose a UNet-inspired module that employs 3D convolution to encode vital local information and learn cross-modal features simultaneously.
arXiv Detail & Related papers (2023-07-07T04:58:34Z) - SARN: Structurally-Aware Recurrent Network for Spatio-Temporal Disaggregation [8.636014676778682]
Open data is frequently released spatially aggregated, usually to comply with privacy policies. But coarse, heterogeneous aggregations complicate coherent learning and integration for downstream AI/ML systems.
We propose an overarching model named Structurally-Aware Recurrent Network (SARN), which integrates structurally-aware spatial attention layers into the Gated Recurrent Unit (GRU) model.
For scenarios with limited historical training data, we show that a model pre-trained on one city variable can be fine-tuned for another city variable using only a few hundred samples.
arXiv Detail & Related papers (2023-06-09T21:01:29Z) - Neural Embeddings of Urban Big Data Reveal Emergent Structures in Cities [7.148078723492643]
We propose using a neural embedding model-graph neural network (GNN)- that leverages the heterogeneous features of urban areas.
Using large-scale high-resolution mobility data sets from millions of aggregated and anonymized mobile phone users in 16 metropolitan counties in the United States, we demonstrate that our embeddings encode complex relationships among features related to urban components.
We show that embeddings generated by a model trained on a different county can capture 50% to 60% of the emergent spatial structure in another county.
arXiv Detail & Related papers (2021-10-24T07:13:14Z) - Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT
Benchmark for Crowd Counting [109.32927895352685]
We introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people.
To facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework.
Experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting.
arXiv Detail & Related papers (2020-12-08T16:18:29Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral
Super-Resolution [79.97180849505294]
We propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet, to enhance the spatial resolution of HSI.
Experiments are conducted on three widely-used HS-MS datasets in comparison with state-of-the-art HSI-SR models.
arXiv Detail & Related papers (2020-07-10T08:08:20Z) - X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for
Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet.
X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network.
We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z) - Multi-level Feature Fusion-based CNN for Local Climate Zone
Classification from Sentinel-2 Images: Benchmark Results on the So2Sat LCZ42
Dataset [14.588626423336024]
This study is based on the big So2Sat LCZ42 benchmark dataset dedicated to LCZ classification.
In addition, we proposed a CNN to classify LCZs from Sentinel-2 images, Sen2LCZ-Net. Using this base network, we propose fusing multi-level features using the extended Sen2LCZ-Net-MF.
Our work will provide important baselines for future CNN-based algorithm developments for both LCZ classification and other urban land cover land use classification.
arXiv Detail & Related papers (2020-05-16T13:16:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.