Deep Learning-based Cross-modal Reconstruction of Vehicle Target from Sparse 3D SAR Image
- URL: http://arxiv.org/abs/2406.04158v7
- Date: Thu, 11 Sep 2025 16:13:03 GMT
- Title: Deep Learning-based Cross-modal Reconstruction of Vehicle Target from Sparse 3D SAR Image
- Authors: Da Li, Guoqiang Zhao, Chen Yao, Kaiqiang Zhu, Houjun Sun, Jiacheng Bao, Maokun Li,
- Abstract summary: We introduce cross-modal learning and propose a Cross-Modal 3D-SAR Reconstruction Network (CMAR-Net) for enhancing sparse 3D SAR images of vehicle targets by fusing optical information.<n>CMAR-Net achieves efficient training and reconstructs sparse 3D SAR images, which are derived from highly sparse-aspect observations, into visually structured 3D vehicle images.
- Score: 6.499547636078961
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Three-dimensional synthetic aperture radar (3D SAR) is an advanced active microwave imaging technology widely utilized in remote sensing area. To achieve high-resolution 3D imaging,3D SAR requires observations from multiple aspects and altitude baselines surrounding the target. However, constrained flight trajectories often lead to sparse observations, which degrade imaging quality, particularly for anisotropic man-made small targets, such as vehicles and aircraft. In the past, compressive sensing (CS) was the mainstream approach for sparse 3D SAR image reconstruction. More recently, deep learning (DL) has emerged as a powerful alternative, markedly boosting reconstruction quality and efficiency. However, existing DL-based methods typically rely solely on high-quality 3D SAR images as supervisory signals to train deep neural networks (DNNs). This unimodal learning paradigm prevents the integration of complementary information from other data modalities, which limits reconstruction performance and reduces target discriminability due to the inherent constraints of electromagnetic scattering. In this paper, we introduce cross-modal learning and propose a Cross-Modal 3D-SAR Reconstruction Network (CMAR-Net) for enhancing sparse 3D SAR images of vehicle targets by fusing optical information. Leveraging cross-modal supervision from 2D optical images and error propagation guaranteed by differentiable rendering, CMAR-Net achieves efficient training and reconstructs sparse 3D SAR images, which are derived from highly sparse-aspect observations, into visually structured 3D vehicle images. Trained exclusively on simulated data, CMAR-Net exhibits robust generalization to real-world data, outperforming state-of-the-art CS and DL methods in structural accuracy within a large-scale parking lot experiment involving numerous civilian vehicles, thereby demonstrating its strong practical applicability.
Related papers
- RadioGen3D: 3D Radio Map Generation via Adversarial Learning on Large-Scale Synthetic Data [62.63849426834315]
Radio maps are essential for efficient radio resource management in future 6G and low-altitude networks.<n>Deep learning (DL) techniques have emerged as an efficient alternative to conventional ray-tracing for radio map estimation.<n>We present the RadioGen3D framework to capture essential 3D signal propagation characteristics and antenna polarization effects.
arXiv Detail & Related papers (2026-02-21T07:50:05Z) - Urban Neural Surface Reconstruction from Constrained Sparse Aerial Imagery with 3D SAR Fusion [5.462159447632879]
We present the first framework that fuses 3D synthetic aperture radar point clouds with aerial imagery for high-fidelity reconstruction under constrained, sparse-view conditions.<n>Our framework integrates radar-derived spatial constraints into an SDF-based NSR backbone, guiding structure-aware ray selection and adaptive sampling for stable and efficient optimization.
arXiv Detail & Related papers (2026-01-29T17:47:07Z) - Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting [64.64738535860351]
We present a scalable pipeline that converts single-view images into comprehensive, scale- and appearance-realistic 3D representations.<n>Our method bridges the gap between the vast repository of imagery and the increasing demand for spatial scene understanding.<n>By automatically generating authentic, scale-aware 3D data from images, we significantly reduce data collection costs and open new avenues for advancing spatial intelligence.
arXiv Detail & Related papers (2025-07-24T14:53:26Z) - BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images [21.811586185200706]
This paper addresses the challenge of reconstructing vehicles from sparse-view inputs.<n>We leverage depth maps and a robust pose estimation architecture to synthesize novel views.<n>We present a novel dataset featuring both synthetic and real-world public transportation vehicles.
arXiv Detail & Related papers (2025-07-16T10:04:35Z) - MoNetV2: Enhanced Motion Network for Freehand 3D Ultrasound Reconstruction [11.531888235029445]
We propose an enhanced motion network (MoNetV2) to enhance the accuracy and generalizability of reconstruction under diverse scanning velocities and tactics.<n>MoNetV2 surpasses existing methods in both reconstruction quality and generalizability performance across three large datasets.
arXiv Detail & Related papers (2025-06-16T04:57:34Z) - Multi-view 3D surface reconstruction from SAR images by inverse rendering [4.964816143841665]
We propose a new inverse rendering method for 3D reconstruction from unconstrained Synthetic Aperture Radar (SAR) images.<n>Our method showcases the potential of exploiting geometric disparities in SAR images and paves the way for multi-sensor data fusion.
arXiv Detail & Related papers (2025-02-14T13:19:32Z) - Multi-Resolution SAR and Optical Remote Sensing Image Registration Methods: A Review, Datasets, and Future Perspectives [13.749888089968373]
Synthetic Aperture Radar (SAR) and optical image registration is essential for remote sensing data fusion.
As image resolution increases, fine SAR textures become more significant, leading to alignment issues and 3D spatial discrepancies.
The MultiResSAR dataset was created, containing over 10k pairs of multi-source, multi-resolution, and multi-scene SAR and optical images.
arXiv Detail & Related papers (2025-02-03T02:51:30Z) - LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving [52.83707400688378]
LargeAD is a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets.
Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples.
Our approach delivers significant performance improvements over state-of-the-art methods in both linear probing and fine-tuning tasks for both LiDAR-based segmentation and object detection.
arXiv Detail & Related papers (2025-01-07T18:59:59Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models [59.13757801286343]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.<n>We introduce the FILP-3D framework with two novel components: the Redundant Feature Eliminator (RFE) for feature space misalignment and the Spatial Noise Compensator (SNC) for significant noise.
arXiv Detail & Related papers (2023-12-28T14:52:07Z) - Leveraging Neural Radiance Fields for Uncertainty-Aware Visual
Localization [56.95046107046027]
We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for scene coordinate regression.
Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain.
arXiv Detail & Related papers (2023-10-10T20:11:13Z) - Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z) - NeRF-GAN Distillation for Efficient 3D-Aware Generation with
Convolutions [97.27105725738016]
integration of Neural Radiance Fields (NeRFs) and generative models, such as Generative Adversarial Networks (GANs) has transformed 3D-aware generation from single-view images.
We propose a simple and effective method, based on re-using the well-disentangled latent space of a pre-trained NeRF-GAN in a pose-conditioned convolutional network to directly generate 3D-consistent images corresponding to the underlying 3D representations.
arXiv Detail & Related papers (2023-03-22T18:59:48Z) - Unleash the Potential of Image Branch for Cross-modal 3D Object
Detection [67.94357336206136]
We present a new cross-modal 3D object detector, namely UPIDet, which aims to unleash the potential of the image branch from two aspects.
First, UPIDet introduces a new 2D auxiliary task called normalized local coordinate map estimation.
Second, we discover that the representational capability of the point cloud backbone can be enhanced through the gradients backpropagated from the training objectives of the image branch.
arXiv Detail & Related papers (2023-01-22T08:26:58Z) - 3D Reconstruction of Non-cooperative Resident Space Objects using
Instant NGP-accelerated NeRF and D-NeRF [0.0]
This work adapts Instant NeRF and D-NeRF, variations of the neural radiance field (NeRF) algorithm to the problem of mapping RSOs in orbit.
The algorithms are evaluated for 3D reconstruction quality and hardware requirements using datasets of images of a spacecraft mock-up.
arXiv Detail & Related papers (2023-01-22T05:26:08Z) - A Deep Learning Approach for SAR Tomographic Imaging of Forested Areas [10.477070348391079]
We show that light-weight neural networks can be trained to perform the tomographic inversion with a single feed-forward pass.
We train our encoder-decoder network using simulated data and validate our technique on real L-band and P-band data.
arXiv Detail & Related papers (2023-01-20T14:34:03Z) - Neural 3D Reconstruction in the Wild [86.6264706256377]
We introduce a new method that enables efficient and accurate surface reconstruction from Internet photo collections.
We present a new benchmark and protocol for evaluating reconstruction performance on such in-the-wild scenes.
arXiv Detail & Related papers (2022-05-25T17:59:53Z) - DH-GAN: A Physics-driven Untrained Generative Adversarial Network for 3D
Microscopic Imaging using Digital Holography [3.4635026053111484]
Digital holography is a 3D imaging technique by emitting a laser beam with a plane wavefront to an object and measuring the intensity of the diffracted waveform, called holograms.
Recently, deep learning (DL) methods have been used for more accurate holographic processing.
We propose a new DL architecture based on generative adversarial networks that uses a discriminative network for realizing a semantic measure for reconstruction quality.
arXiv Detail & Related papers (2022-05-25T17:13:45Z) - Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust
Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet)
CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement.
Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z) - Deep-Learning-Based Single-Image Height Reconstruction from
Very-High-Resolution SAR Intensity Data [1.7894377200944511]
We present the first-ever demonstration of deep learning-based single image height prediction for the other important sensor modality in remote sensing: synthetic aperture radar (SAR) data.
Besides the adaptation of a convolutional neural network (CNN) architecture for SAR intensity images, we present a workflow for the generation of training data.
Since we put a particular emphasis on transferability, we are able to confirm that deep learning-based single-image height estimation is not only possible, but also transfers quite well to unseen data.
arXiv Detail & Related papers (2021-11-03T08:20:03Z) - Homography augumented momentum constrastive learning for SAR image
retrieval [3.9743795764085545]
We propose a deep learning-based image retrieval approach using homography transformation augmented contrastive learning.
We also propose a training method for the DNNs induced by contrastive learning that does not require any labeling procedure.
arXiv Detail & Related papers (2021-09-21T17:27:07Z) - 3DRIMR: 3D Reconstruction and Imaging via mmWave Radar based on Deep
Learning [9.26903816093995]
mmWave radar has been shown as an effective sensing technique in low visibility, smoke, dusty, and dense fog environment.
We propose 3D Reconstruction and Imaging via mmWave Radar (3DRIMR), a deep learning based architecture that reconstructs 3D shape of an object in dense detailed point cloud format.
Our experiments have demonstrated 3DRIMR's effectiveness in reconstructing 3D objects, and its performance improvement over standard techniques.
arXiv Detail & Related papers (2021-08-05T21:24:57Z) - Stereo Matching by Self-supervision of Multiscopic Vision [65.38359887232025]
We propose a new self-supervised framework for stereo matching utilizing multiple images captured at aligned camera positions.
A cross photometric loss, an uncertainty-aware mutual-supervision loss, and a new smoothness loss are introduced to optimize the network.
Our model obtains better disparity maps than previous unsupervised methods on the KITTI dataset.
arXiv Detail & Related papers (2021-04-09T02:58:59Z) - Compressive spectral image classification using 3D coded convolutional
neural network [12.67293744927537]
This paper develops a novel deep learning HIC approach based on measurements of coded-aperture snapshot spectral imagers (CASSI)
A new kind of deep learning strategy, namely 3D coded convolutional neural network (3D-CCNN), is proposed to efficiently solve for the classification problem.
The accuracy of classification is effectively improved by exploiting the synergy between the deep learning network and coded apertures.
arXiv Detail & Related papers (2020-09-23T15:05:57Z) - X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for
Classification of Remote Sensing Data [69.37597254841052]
We propose a novel cross-modal deep-learning framework called X-ModalNet.
X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network.
We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.
arXiv Detail & Related papers (2020-06-24T15:29:41Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z) - A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-view
Stereo Reconstruction from An Open Aerial Dataset [6.319667056655425]
We present a synthetic aerial dataset, called the WHU dataset, which is the first large-scale multi-view aerial dataset.
We also introduce in this paper a novel network, called RED-Net, for wide-range depth inference.
Our experiments confirmed that not only did our method exceed the current state-of-the-art MVS methods by more than 50% mean absolute error (MAE) with less memory and computational cost, but its efficiency as well.
arXiv Detail & Related papers (2020-03-02T03:04:13Z) - Deep Non-Line-of-Sight Reconstruction [18.38481917675749]
In this paper, we employ convolutional feed-forward networks for solving the reconstruction problem efficiently.
We devise a tailored autoencoder architecture, trained end-to-end reconstruction maps transient images directly to a depth map representation.
We demonstrate that our feed-forward network, even though it is trained solely on synthetic data, generalizes to measured data from SPAD sensors and is able to obtain results that are competitive with model-based reconstruction methods.
arXiv Detail & Related papers (2020-01-24T16:05:50Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.