Unsupervised Multimodal Change Detection Based on Structural
Relationship Graph Representation Learning
- URL: http://arxiv.org/abs/2210.00941v1
- Date: Mon, 3 Oct 2022 13:55:08 GMT
- Title: Unsupervised Multimodal Change Detection Based on Structural
Relationship Graph Representation Learning
- Authors: Hongruixuan Chen and Naoto Yokoya and Chen Wu and Bo Du
- Abstract summary: Unsupervised multimodal change detection is a practical and challenging topic that can play an important role in time-sensitive emergency applications.
We take advantage of two types of modality-independent structural relationships in multimodal images.
We present a structural relationship graph representation learning framework for measuring the similarity of the two structural relationships.
- Score: 40.631724905575034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised multimodal change detection is a practical and challenging topic
that can play an important role in time-sensitive emergency applications. To
address the challenge that multimodal remote sensing images cannot be directly
compared due to their modal heterogeneity, we take advantage of two types of
modality-independent structural relationships in multimodal images. In
particular, we present a structural relationship graph representation learning
framework for measuring the similarity of the two structural relationships.
Firstly, structural graphs are generated from preprocessed multimodal image
pairs by means of an object-based image analysis approach. Then, a structural
relationship graph convolutional autoencoder (SR-GCAE) is proposed to learn
robust and representative features from graphs. Two loss functions aiming at
reconstructing vertex information and edge information are presented to make
the learned representations applicable for structural relationship similarity
measurement. Subsequently, the similarity levels of two structural
relationships are calculated from learned graph representations and two
difference images are generated based on the similarity levels. After obtaining
the difference images, an adaptive fusion strategy is presented to fuse the two
difference images. Finally, a morphological filtering-based postprocessing
approach is employed to refine the detection results. Experimental results on
five datasets with different modal combinations demonstrate the effectiveness
of the proposed method.
Related papers
- Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment [20.902935570581207]
We introduce a Multimodal Alignment and Reconstruction Network (MARNet) to enhance the model's resistance to visual noise.
MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains.
Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model.
arXiv Detail & Related papers (2024-07-26T16:30:18Z) - Bayesian Unsupervised Disentanglement of Anatomy and Geometry for Deep Groupwise Image Registration [50.62725807357586]
This article presents a general Bayesian learning framework for multi-modal groupwise image registration.
We propose a novel hierarchical variational auto-encoding architecture to realise the inference procedure of the latent variables.
Experiments were conducted to validate the proposed framework, including four different datasets from cardiac, brain, and abdominal medical images.
arXiv Detail & Related papers (2024-01-04T08:46:39Z) - Learning transformer-based heterogeneously salient graph representation for multimodal remote sensing image classification [42.15709954199397]
A transformer-based heterogeneously salient graph representation (THSGR) approach is proposed in this paper.
First, a multimodal heterogeneous graph encoder is presented to encode distinctively non-Euclidean structural features from heterogeneous data.
A self-attention-free multi-convolutional modulator is designed for effective and efficient long-term dependency modeling.
arXiv Detail & Related papers (2023-11-17T04:06:20Z) - Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image.
We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z) - Inferring Local Structure from Pairwise Correlations [0.0]
We show that pairwise correlations provide enough information to recover local relations.
This proves to be successful even though higher order interaction structures are present in our data.
arXiv Detail & Related papers (2023-05-07T22:38:29Z) - Transformer-based Dual Relation Graph for Multi-label Image Recognition [56.12543717723385]
We propose a novel Transformer-based Dual Relation learning framework.
We explore two aspects of correlation, i.e., structural relation graph and semantic relation graph.
Our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks.
arXiv Detail & Related papers (2021-10-10T07:14:52Z) - Generating Diverse Structure for Image Inpainting With Hierarchical
VQ-VAE [74.29384873537587]
We propose a two-stage model for diverse inpainting, where the first stage generates multiple coarse results each of which has a different structure, and the second stage refines each coarse result separately by augmenting texture.
Experimental results on CelebA-HQ, Places2, and ImageNet datasets show that our method not only enhances the diversity of the inpainting solutions but also improves the visual quality of the generated multiple images.
arXiv Detail & Related papers (2021-03-18T05:10:49Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.