Multi-Spectral Image Stitching via Spatial Graph Reasoning
- URL: http://arxiv.org/abs/2307.16741v1
- Date: Mon, 31 Jul 2023 15:04:52 GMT
- Title: Multi-Spectral Image Stitching via Spatial Graph Reasoning
- Authors: Zhiying Jiang, Zengxi Zhang, Jinyuan Liu, Xin Fan, Risheng Liu
- Abstract summary: We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
- Score: 52.27796682972484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-spectral image stitching leverages the complementarity between infrared
and visible images to generate a robust and reliable wide field-of-view (FOV)
scene. The primary challenge of this task is to explore the relations between
multi-spectral images for aligning and integrating multi-view scenes.
Capitalizing on the strengths of Graph Convolutional Networks (GCNs) in
modeling feature relationships, we propose a spatial graph reasoning based
multi-spectral image stitching method that effectively distills the deformation
and integration of multi-spectral images across different viewpoints. To
accomplish this, we embed multi-scale complementary features from the same view
position into a set of nodes. The correspondence across different views is
learned through powerful dense feature embeddings, where both inter- and
intra-correlations are developed to exploit cross-view matching and enhance
inner feature disparity. By introducing long-range coherence along spatial and
channel dimensions, the complementarity of pixel relations and channel
interdependencies aids in the reconstruction of aligned multi-view features,
generating informative and reliable wide FOV scenes. Moreover, we release a
challenging dataset named ChaMS, comprising both real-world and synthetic sets
with significant parallax, providing a new option for comprehensive evaluation.
Extensive experiments demonstrate that our method surpasses the
state-of-the-arts.
Related papers
- A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - Focus on Neighbors and Know the Whole: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation [64.07560335451723]
CoSER is a novel consistent dense Multiview Text-to-Image Generator for Text-to-3D.
It achieves both efficiency and quality by meticulously learning neighbor-view coherence.
It aggregates information along motion paths explicitly defined by physical principles to refine details.
arXiv Detail & Related papers (2024-08-23T15:16:01Z) - Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment [20.902935570581207]
We introduce a Multimodal Alignment and Reconstruction Network (MARNet) to enhance the model's resistance to visual noise.
MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains.
Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model.
arXiv Detail & Related papers (2024-07-26T16:30:18Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - Learning multi-domain feature relation for visible and Long-wave
Infrared image patch matching [39.88037892637296]
We present the largest visible and Long-wave Infrared (LWIR) image patch matching dataset, termed VL-CMIM.
In addition, a multi-domain feature relation learning network (MD-FRN) is proposed.
arXiv Detail & Related papers (2023-08-09T11:23:32Z) - Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth
Estimation in Dynamic Scenes [51.20150148066458]
We propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the generalizationally crafted masks.
Experiments on real-world datasets prove the significant effectiveness and ability of the proposed method.
arXiv Detail & Related papers (2023-04-18T13:55:24Z) - Cross-View Hierarchy Network for Stereo Image Super-Resolution [14.574538513341277]
Stereo image super-resolution aims to improve the quality of high-resolution stereo image pairs by exploiting complementary information across views.
We propose a novel method, named Cross-View-Hierarchy Network for Stereo Image Super-Resolution (CVHSSR)
CVHSSR achieves the best stereo image super-resolution performance than other state-of-the-art methods while using fewer parameters.
arXiv Detail & Related papers (2023-04-13T03:11:30Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.