ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in
All Weather Conditions
- URL: http://arxiv.org/abs/2210.01346v3
- Date: Wed, 20 Sep 2023 05:01:45 GMT
- Title: ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in
All Weather Conditions
- Authors: Anjun Chen, Xiangyu Wang, Kun Shi, Shaohao Zhu, Bin Fang, Yingfeng
Chen, Jiming Chen, Yuchi Huo, Qi Ye
- Abstract summary: We present ImmFusion, the first mmWave-RGB fusion solution to reconstruct 3D human bodies robustly.
Our method's accuracy is significantly superior to that of state-of-the-art Transformer-based LiDAR-camera fusion methods.
- Score: 23.146325482439988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D human reconstruction from RGB images achieves decent results in good
weather conditions but degrades dramatically in rough weather. Complementary,
mmWave radars have been employed to reconstruct 3D human joints and meshes in
rough weather. However, combining RGB and mmWave signals for robust all-weather
3D human reconstruction is still an open challenge, given the sparse nature of
mmWave and the vulnerability of RGB images. In this paper, we present
ImmFusion, the first mmWave-RGB fusion solution to reconstruct 3D human bodies
in all weather conditions robustly. Specifically, our ImmFusion consists of
image and point backbones for token feature extraction and a Transformer module
for token fusion. The image and point backbones refine global and local
features from original data, and the Fusion Transformer Module aims for
effective information fusion of two modalities by dynamically selecting
informative tokens. Extensive experiments on a large-scale dataset, mmBody,
captured in various environments demonstrate that ImmFusion can efficiently
utilize the information of two modalities to achieve a robust 3D human body
reconstruction in all weather conditions. In addition, our method's accuracy is
significantly superior to that of state-of-the-art Transformer-based
LiDAR-camera fusion methods.
Related papers
- MoCTEFuse: Illumination-Gated Mixture of Chiral Transformer Experts for Multi-Level Infrared and Visible Image Fusion [0.0]
We propose a dynamic multi-level image fusion network called MoCTEFuse.<n>MoCTEFuse adaptively preserve texture details and object contrasts in balance.<n>Experiments conducted on the DroneVehicle, MSRS, TNO and RoadScene datasets show MoCTEFuse's superior fusion performance.
arXiv Detail & Related papers (2025-07-27T08:54:16Z) - DFVO: Learning Darkness-free Visible and Infrared Image Disentanglement and Fusion All at Once [57.15043822199561]
A Darkness-Free network is proposed to handle Visible and infrared image disentanglement and fusion all at Once (DFVO)<n>DFVO employs a cascaded multi-task approach to replace the traditional two-stage cascaded training (enhancement and fusion)<n>Our proposed approach outperforms state-of-the-art alternatives in terms of qualitative and quantitative evaluations.
arXiv Detail & Related papers (2025-05-07T15:59:45Z) - mmDEAR: mmWave Point Cloud Density Enhancement for Accurate Human Body Reconstruction [14.480271406960467]
We propose a two-stage deep learning framework that enhances mmWave point clouds and improves body reconstruction accuracy.
Our approach outperforms state-of-the-art methods, with the enhanced point clouds further improving performance when integrated into existing models.
arXiv Detail & Related papers (2025-03-04T08:03:53Z) - FOF-X: Towards Real-time Detailed Human Reconstruction from a Single Image [68.84221452621674]
We introduce FOF-X for real-time reconstruction of detailed human geometry from a single image.
FOF-X avoids the performance degradation caused by texture and lighting.
We enhance the inter-conversion algorithms between FOF and mesh representations with a Laplacian constraint and an automaton-based discontinuity matcher.
arXiv Detail & Related papers (2024-12-08T14:46:29Z) - FlatFusion: Delving into Details of Sparse Transformer-based Camera-LiDAR Fusion for Autonomous Driving [63.96049803915402]
The integration of data from diverse sensor modalities constitutes a prevalent methodology within the ambit of autonomous driving scenarios.
Recent advancements in efficient point cloud transformers have underscored the efficacy of integrating information in sparse formats.
In this paper, we conduct a comprehensive exploration of design choices for Transformer-based sparse cameraLiDAR fusion.
arXiv Detail & Related papers (2024-08-13T11:46:32Z) - Attentive Multimodal Fusion for Optical and Scene Flow [24.08052492109655]
Existing methods typically rely solely on RGB images or fuse the modalities at later stages.
We propose a novel deep neural network approach named FusionRAFT, which enables early-stage information fusion between sensor modalities.
Our approach exhibits improved robustness in the presence of noise and low-lighting conditions that affect the RGB images.
arXiv Detail & Related papers (2023-07-28T04:36:07Z) - Pyramid Deep Fusion Network for Two-Hand Reconstruction from RGB-D Images [11.100398985633754]
We propose an end-to-end framework for recovering dense meshes for both hands.
Our framework employs ResNet50 and PointNet++ to derive features from RGB and point cloud.
We also introduce a novel pyramid deep fusion network (PDFNet) to aggregate features at different scales.
arXiv Detail & Related papers (2023-07-12T09:33:21Z) - mmBody Benchmark: 3D Body Reconstruction Dataset and Analysis for
Millimeter Wave Radar [10.610455816814985]
Millimeter Wave (mmWave) Radar is gaining popularity as it can work in adverse environments like smoke, rain, snow, poor lighting, etc.
Prior work has explored the possibility of reconstructing 3D skeletons or meshes from the noisy and sparse mmWave Radar signals.
This dataset consists of synchronized and calibrated mmWave radar point clouds and RGB(D) images in different scenes and skeleton/mesh annotations for humans in the scenes.
arXiv Detail & Related papers (2022-09-12T08:00:31Z) - Mirror Complementary Transformer Network for RGB-thermal Salient Object
Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair.
In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD.
Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z) - TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with
Transformers [49.689566246504356]
We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions.
TransFusion achieves state-of-the-art performance on large-scale datasets.
We extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking.
arXiv Detail & Related papers (2022-03-22T07:15:13Z) - Total Scale: Face-to-Body Detail Reconstruction from Sparse RGBD Sensors [52.38220261632204]
Flat facial surfaces frequently occur in the PIFu-based reconstruction results.
We propose a two-scale PIFu representation to enhance the quality of the reconstructed facial details.
Experiments demonstrate the effectiveness of our approach in vivid facial details and deforming body shapes.
arXiv Detail & Related papers (2021-12-03T18:46:49Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View
Selection and Fusion [68.68537312256144]
VoRTX is an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion.
We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods.
arXiv Detail & Related papers (2021-12-01T02:18:11Z) - A Single Stream Network for Robust and Real-time RGB-D Salient Object
Detection [89.88222217065858]
We design a single stream network to use the depth map to guide early fusion and middle fusion between RGB and depth.
This model is 55.5% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 times 384$ image.
arXiv Detail & Related papers (2020-07-14T04:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.