Related papers: Accurate Cross-modal Reconstruction of Vehicle Target from Sparse-aspect Multi-baseline SAR data

Accurate Cross-modal Reconstruction of Vehicle Target from Sparse-aspect Multi-baseline SAR data

URL: http://arxiv.org/abs/2406.04158v5
Date: Fri, 01 Aug 2025 07:17:40 GMT
Title: Accurate Cross-modal Reconstruction of Vehicle Target from Sparse-aspect Multi-baseline SAR data
Authors: Da Li, Guoqiang Zhao, Chen Yao, Kaiqiang Zhu, Houjun Sun, Jiacheng Bao, Maokun Li,
Abstract summary: Multi-aspect multi-baseline SAR 3D imaging is a critical remote sensing technique, promising in urban mapping and monitoring.<n>In the past, compressive sensing (CS) was the mainstream approach for sparse 3D SAR reconstruction.<n>Deep learning (DL) has emerged as a powerful alternative, markedly boosting reconstruction quality and efficiency.
Score: 5.757535707973869
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-aspect multi-baseline SAR 3D imaging is a critical remote sensing technique, promising in urban mapping and monitoring. However, sparse observation due to constrained flight trajectories degrade imaging quality, particularly for anisotropic small targets like vehicles and aircraft. In the past, compressive sensing (CS) was the mainstream approach for sparse 3D SAR reconstruction. More recently, deep learning (DL) has emerged as a powerful alternative, markedly boosting reconstruction quality and efficiency through strong data-driven representations capabilities and fast inference characteristics. However, existing DL methods typically train deep neural networks (DNNs) using only high-resolution radar images. This unimodal learning paradigm precludes the incorporation of complementary information from other data sources, thereby limiting potential improvements in reconstruction performance. In this paper, we introduce cross-modal learning and propose a Cross-Modal 3D-SAR Reconstruction Network (CMAR-Net) that enhances sparse 3D SAR reconstruction by fusing heterogeneous information. Leveraging cross-modal supervision from 2D optical images and error propagation guaranteed by differentiable rendering, CMAR-Net achieves efficient training and reconstructs highly sparse-aspect multi-baseline SAR image into visually structured and accurate 3D images, particularly for vehicle targets. Trained solely on simulated data, CMAR-Net exhibits strong generalization across extensive real-world evaluations on parking lot measurements containing numerous civilian vehicles, outperforming state-of-the-art CS and DL methods in structural accuracy. Our work highlights the potential of cross-modal learning for 3D SAR reconstruction and introduces a novel framework for radar imaging research.

Related papers

BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images [21.811586185200706]
This paper addresses the challenge of reconstructing vehicles from sparse-view inputs.<n>We leverage depth maps and a robust pose estimation architecture to synthesize novel views.<n>We present a novel dataset featuring both synthetic and real-world public transportation vehicles.
arXiv Detail & Related papers (2025-07-16T10:04:35Z)
MoNetV2: Enhanced Motion Network for Freehand 3D Ultrasound Reconstruction [11.531888235029445]
We propose an enhanced motion network (MoNetV2) to enhance the accuracy and generalizability of reconstruction under diverse scanning velocities and tactics.<n>MoNetV2 surpasses existing methods in both reconstruction quality and generalizability performance across three large datasets.
arXiv Detail & Related papers (2025-06-16T04:57:34Z)
Multi-view 3D surface reconstruction from SAR images by inverse rendering [4.964816143841665]
We propose a new inverse rendering method for 3D reconstruction from unconstrained Synthetic Aperture Radar (SAR) images.<n>Our method showcases the potential of exploiting geometric disparities in SAR images and paves the way for multi-sensor data fusion.
arXiv Detail & Related papers (2025-02-14T13:19:32Z)
Multi-Resolution SAR and Optical Remote Sensing Image Registration Methods: A Review, Datasets, and Future Perspectives [13.749888089968373]
Synthetic Aperture Radar (SAR) and optical image registration is essential for remote sensing data fusion. As image resolution increases, fine SAR textures become more significant, leading to alignment issues and 3D spatial discrepancies. The MultiResSAR dataset was created, containing over 10k pairs of multi-source, multi-resolution, and multi-scene SAR and optical images.
arXiv Detail & Related papers (2025-02-03T02:51:30Z)
LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving [52.83707400688378]
LargeAD is a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets. Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples. Our approach delivers significant performance improvements over state-of-the-art methods in both linear probing and fine-tuning tasks for both LiDAR-based segmentation and object detection.
arXiv Detail & Related papers (2025-01-07T18:59:59Z)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns. A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z)
GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z)
Leveraging Neural Radiance Fields for Uncertainty-Aware Visual Localization [56.95046107046027]
We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for scene coordinate regression. Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain.
arXiv Detail & Related papers (2023-10-10T20:11:13Z)
Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects. We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z)
Unleash the Potential of Image Branch for Cross-modal 3D Object Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects. First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation. Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z)
A Deep Learning Approach for SAR Tomographic Imaging of Forested Areas [10.477070348391079]
We show that light-weight neural networks can be trained to perform the tomographic inversion with a single feed-forward pass. We train our encoder-decoder network using simulated data and validate our technique on real L-band and P-band data.
arXiv Detail & Related papers (2023-01-20T14:34:03Z)
Neural 3D Reconstruction in the Wild [86.6264706256377]
We introduce a new method that enables efficient and accurate surface reconstruction from Internet photo collections. We present a new benchmark and protocol for evaluating reconstruction performance on such in-the-wild scenes.
arXiv Detail & Related papers (2022-05-25T17:59:53Z)
DH-GAN: A Physics-driven Untrained Generative Adversarial Network for 3D Microscopic Imaging using Digital Holography [3.4635026053111484]
Digital holography is a 3D imaging technique by emitting a laser beam with a plane wavefront to an object and measuring the intensity of the diffracted waveform, called holograms. Recently, deep learning (DL) methods have been used for more accurate holographic processing. We propose a new DL architecture based on generative adversarial networks that uses a discriminative network for realizing a semantic measure for reconstruction quality.
arXiv Detail & Related papers (2022-05-25T17:13:45Z)
Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet) CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement. Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z)
Deep-Learning-Based Single-Image Height Reconstruction from Very-High-Resolution SAR Intensity Data [1.7894377200944511]
We present the first-ever demonstration of deep learning-based single image height prediction for the other important sensor modality in remote sensing: synthetic aperture radar (SAR) data. Besides the adaptation of a convolutional neural network (CNN) architecture for SAR intensity images, we present a workflow for the generation of training data. Since we put a particular emphasis on transferability, we are able to confirm that deep learning-based single-image height estimation is not only possible, but also transfers quite well to unseen data.
arXiv Detail & Related papers (2021-11-03T08:20:03Z)
Homography augumented momentum constrastive learning for SAR image retrieval [3.9743795764085545]
We propose a deep learning-based image retrieval approach using homography transformation augmented contrastive learning. We also propose a training method for the DNNs induced by contrastive learning that does not require any labeling procedure.
arXiv Detail & Related papers (2021-09-21T17:27:07Z)
Stereo Matching by Self-supervision of Multiscopic Vision [65.38359887232025]
We propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera positions. A cross photometric loss, an uncertainty-aware mutual-supervision loss, and a new smoothness loss are introduced to optimize the network. Our model obtains better disparity maps than previous unsupervised methods on the KITTI dataset.
arXiv Detail & Related papers (2021-04-09T02:58:59Z)
Compressive spectral image classification using 3D coded convolutional neural network [12.67293744927537]
This paper develops a novel deep learning HIC approach based on measurements of coded-aperture snapshot spectral imagers (CASSI) A new kind of deep learning strategy, namely 3D coded convolutional neural network (3D-CCNN), is proposed to efficiently solve for the classification problem. The accuracy of classification is effectively improved by exploiting the synergy between the deep learning network and coded apertures.
arXiv Detail & Related papers (2020-09-23T15:05:57Z)
X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet. X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z)
Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object. We first estimate per-view depth maps using a deep multi-view stereo network. These depth maps are used to coarsely align the different views. We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)
Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task. We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-view Stereo Reconstruction from An Open Aerial Dataset [6.319667056655425]
We present a synthetic aerial dataset, called the WHU dataset, which is the first large-scale multi-view aerial dataset. We also introduce in this paper a novel network, called RED-Net, for wide-range depth inference. Our experiments confirmed that not only did our method exceed the current state-of-the-art MVS methods by more than 50% mean absolute error (MAE) with less memory and computational cost, but its efficiency as well.
arXiv Detail & Related papers (2020-03-02T03:04:13Z)
Deep Non-Line-of-Sight Reconstruction [18.38481917675749]
In this paper, we employ convolutional feed-forward networks for solving the reconstruction problem efficiently. We devise a tailored autoencoder architecture, trained end-to-end reconstruction maps transient images directly to a depth map representation. We demonstrate that our feed-forward network, even though it is trained solely on synthetic data, generalizes to measured data from SPAD sensors and is able to obtain results that are competitive with model-based reconstruction methods.
arXiv Detail & Related papers (2020-01-24T16:05:50Z)
Spatial-Spectral Residual Network for Hyperspectral Image Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet) Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information. In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.