Related papers: Learning transformer-based heterogeneously salient graph representation for multimodal remote sensing image classification

Learning transformer-based heterogeneously salient graph representation for multimodal remote sensing image classification

URL: http://arxiv.org/abs/2311.10320v2
Date: Mon, 10 Jun 2024 08:31:35 GMT
Title: Learning transformer-based heterogeneously salient graph representation for multimodal remote sensing image classification
Authors: Jiaqi Yang, Bo Du, Liangpei Zhang,
Abstract summary: A transformer-based heterogeneously salient graph representation (THSGR) approach is proposed in this paper. First, a multimodal heterogeneous graph encoder is presented to encode distinctively non-Euclidean structural features from heterogeneous data. A self-attention-free multi-convolutional modulator is designed for effective and efficient long-term dependency modeling.
Score: 42.15709954199397
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data collected by different modalities can provide a wealth of complementary information, such as hyperspectral image (HSI) to offer rich spectral-spatial properties, synthetic aperture radar (SAR) to provide structural information about the Earth's surface, and light detection and ranging (LiDAR) to cover altitude information about ground elevation. Therefore, a natural idea is to combine multimodal images for refined and accurate land-cover interpretation. Although many efforts have been attempted to achieve multi-source remote sensing image classification, there are still three issues as follows: 1) indiscriminate feature representation without sufficiently considering modal heterogeneity, 2) abundant features and complex computations associated with modeling long-range dependencies, and 3) overfitting phenomenon caused by sparsely labeled samples. To overcome the above barriers, a transformer-based heterogeneously salient graph representation (THSGR) approach is proposed in this paper. First, a multimodal heterogeneous graph encoder is presented to encode distinctively non-Euclidean structural features from heterogeneous data. Then, a self-attention-free multi-convolutional modulator is designed for effective and efficient long-term dependency modeling. Finally, a mean forward is put forward in order to avoid overfitting. Based on the above structures, the proposed model is able to break through modal gaps to obtain differentiated graph representation with competitive time cost, even for a small fraction of training samples. Experiments and analyses on three benchmark datasets with various state-of-the-art (SOTA) methods show the performance of the proposed approach.

Related papers

A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images. Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data.
arXiv Detail & Related papers (2024-12-01T00:29:57Z)
DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images. Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z)
MODEL&CO: Exoplanet detection in angular differential imaging by learning across multiple observations [37.845442465099396]
Most post-processing methods build a model of the nuisances from the target observations themselves. We propose to build the nuisance model from an archive of multiple observations by leveraging supervised deep learning techniques. We apply the proposed algorithm to several datasets from the VLT/SPHERE instrument, and demonstrate a superior precision-recall trade-off.
arXiv Detail & Related papers (2024-09-23T09:22:45Z)
Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation [45.582869951581785]
Implicit Gaussian Splatting (IGS) is an innovative hybrid model that integrates explicit point clouds with implicit feature embeddings. We introduce a level-based progressive training scheme, which incorporates explicit spatial regularization. Our algorithm can deliver high-quality rendering using only a few MBs, effectively balancing storage efficiency and rendering fidelity.
arXiv Detail & Related papers (2024-08-19T14:34:17Z)
A Generative Machine Learning Model for Material Microstructure 3D Reconstruction and Performance Evaluation [4.169915659794567]
The dimensional extension from 2D to 3D is viewed as a highly challenging inverse problem from the current technological perspective. A novel generative model that integrates the multiscale properties of U-net with and the generative capabilities of GAN has been proposed. The model's accuracy is further improved by combining the image regularization loss with the Wasserstein distance loss.
arXiv Detail & Related papers (2024-02-24T13:42:34Z)
ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation. We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z)
T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified Visual Modalities [69.16656086708291]
Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces. We propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning. The model can be scaled to generate high-resolution data while unifying multiple modalities.
arXiv Detail & Related papers (2023-05-24T03:32:03Z)
Deep Diversity-Enhanced Feature Representation of Hyperspectral Images [87.47202258194719]
We rectify 3D convolution by modifying its topology to enhance the rank upper-bound. We also propose a novel diversity-aware regularization (DA-Reg) term that acts on the feature maps to maximize independence among elements. To demonstrate the superiority of the proposed Re$3$-ConvSet and DA-Reg, we apply them to various HS image processing and analysis tasks.
arXiv Detail & Related papers (2023-01-15T16:19:18Z)
Spatial-spectral Hyperspectral Image Classification via Multiple Random Anchor Graphs Ensemble Learning [88.60285937702304]
This paper proposes a novel spatial-spectral HSI classification method via multiple random anchor graphs ensemble learning (RAGE) Firstly, the local binary pattern is adopted to extract the more descriptive features on each selected band, which preserves local structures and subtle changes of a region. Secondly, the adaptive neighbors assignment is introduced in the construction of anchor graph, to reduce the computational complexity.
arXiv Detail & Related papers (2021-03-25T09:31:41Z)
A Multiscale Graph Convolutional Network for Change Detection in Homogeneous and Heterogeneous Remote Sensing Images [12.823633963080281]
Change detection (CD) in remote sensing images has been an ever-expanding area of research. In this paper, a novel CD method based on the graph convolutional network (GCN) and multiscale object-based technique is proposed for both homogeneous and heterogeneous images.
arXiv Detail & Related papers (2021-02-16T09:26:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.