Learning transformer-based heterogeneously salient graph representation for multimodal remote sensing image classification
- URL: http://arxiv.org/abs/2311.10320v2
- Date: Mon, 10 Jun 2024 08:31:35 GMT
- Title: Learning transformer-based heterogeneously salient graph representation for multimodal remote sensing image classification
- Authors: Jiaqi Yang, Bo Du, Liangpei Zhang,
- Abstract summary: A transformer-based heterogeneously salient graph representation (THSGR) approach is proposed in this paper.
First, a multimodal heterogeneous graph encoder is presented to encode distinctively non-Euclidean structural features from heterogeneous data.
A self-attention-free multi-convolutional modulator is designed for effective and efficient long-term dependency modeling.
- Score: 42.15709954199397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data collected by different modalities can provide a wealth of complementary information, such as hyperspectral image (HSI) to offer rich spectral-spatial properties, synthetic aperture radar (SAR) to provide structural information about the Earth's surface, and light detection and ranging (LiDAR) to cover altitude information about ground elevation. Therefore, a natural idea is to combine multimodal images for refined and accurate land-cover interpretation. Although many efforts have been attempted to achieve multi-source remote sensing image classification, there are still three issues as follows: 1) indiscriminate feature representation without sufficiently considering modal heterogeneity, 2) abundant features and complex computations associated with modeling long-range dependencies, and 3) overfitting phenomenon caused by sparsely labeled samples. To overcome the above barriers, a transformer-based heterogeneously salient graph representation (THSGR) approach is proposed in this paper. First, a multimodal heterogeneous graph encoder is presented to encode distinctively non-Euclidean structural features from heterogeneous data. Then, a self-attention-free multi-convolutional modulator is designed for effective and efficient long-term dependency modeling. Finally, a mean forward is put forward in order to avoid overfitting. Based on the above structures, the proposed model is able to break through modal gaps to obtain differentiated graph representation with competitive time cost, even for a small fraction of training samples. Experiments and analyses on three benchmark datasets with various state-of-the-art (SOTA) methods show the performance of the proposed approach.
Related papers
- MODEL&CO: Exoplanet detection in angular differential imaging by learning across multiple observations [37.845442465099396]
Most post-processing methods build a model of the nuisances from the target observations themselves.
We propose to build the nuisance model from an archive of multiple observations by leveraging supervised deep learning techniques.
We apply the proposed algorithm to several datasets from the VLT/SPHERE instrument, and demonstrate a superior precision-recall trade-off.
arXiv Detail & Related papers (2024-09-23T09:22:45Z) - Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation [45.582869951581785]
Implicit Gaussian Splatting (IGS) is an innovative hybrid model that integrates explicit point clouds with implicit feature embeddings.
We introduce a level-based progressive training scheme, which incorporates explicit spatial regularization.
Our algorithm can deliver high-quality rendering using only a few MBs, effectively balancing storage efficiency and rendering fidelity.
arXiv Detail & Related papers (2024-08-19T14:34:17Z) - A Generative Machine Learning Model for Material Microstructure 3D
Reconstruction and Performance Evaluation [4.169915659794567]
The dimensional extension from 2D to 3D is viewed as a highly challenging inverse problem from the current technological perspective.
A novel generative model that integrates the multiscale properties of U-net with and the generative capabilities of GAN has been proposed.
The model's accuracy is further improved by combining the image regularization loss with the Wasserstein distance loss.
arXiv Detail & Related papers (2024-02-24T13:42:34Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified
Visual Modalities [69.16656086708291]
Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces.
We propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning.
The model can be scaled to generate high-resolution data while unifying multiple modalities.
arXiv Detail & Related papers (2023-05-24T03:32:03Z) - Deep Diversity-Enhanced Feature Representation of Hyperspectral Images [87.47202258194719]
We rectify 3D convolution by modifying its topology to enhance the rank upper-bound.
We also propose a novel diversity-aware regularization (DA-Reg) term that acts on the feature maps to maximize independence among elements.
To demonstrate the superiority of the proposed Re$3$-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks.
arXiv Detail & Related papers (2023-01-15T16:19:18Z) - Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust
Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet)
CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement.
Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z) - Spatial-spectral Hyperspectral Image Classification via Multiple Random
Anchor Graphs Ensemble Learning [88.60285937702304]
This paper proposes a novel spatial-spectral HSI classification method via multiple random anchor graphs ensemble learning (RAGE)
Firstly, the local binary pattern is adopted to extract the more descriptive features on each selected band, which preserves local structures and subtle changes of a region.
Secondly, the adaptive neighbors assignment is introduced in the construction of anchor graph, to reduce the computational complexity.
arXiv Detail & Related papers (2021-03-25T09:31:41Z) - A Multiscale Graph Convolutional Network for Change Detection in
Homogeneous and Heterogeneous Remote Sensing Images [12.823633963080281]
Change detection (CD) in remote sensing images has been an ever-expanding area of research.
In this paper, a novel CD method based on the graph convolutional network (GCN) and multiscale object-based technique is proposed for both homogeneous and heterogeneous images.
arXiv Detail & Related papers (2021-02-16T09:26:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.