What Happened 3 Seconds Ago? Inferring the Past with Thermal Imaging
- URL: http://arxiv.org/abs/2304.13651v1
- Date: Wed, 26 Apr 2023 16:23:10 GMT
- Title: What Happened 3 Seconds Ago? Inferring the Past with Thermal Imaging
- Authors: Zitian Tang, Wenjie Ye, Wei-Chiu Ma, Hang Zhao
- Abstract summary: We collect the first RGB-Thermal dataset for human motion analysis, dubbed Thermal-IM.
We develop a three-stage neural network model for accurate past human pose estimation.
- Score: 22.923237551192834
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inferring past human motion from RGB images is challenging due to the
inherent uncertainty of the prediction problem. Thermal images, on the other
hand, encode traces of past human-object interactions left in the environment
via thermal radiation measurement. Based on this observation, we collect the
first RGB-Thermal dataset for human motion analysis, dubbed Thermal-IM. Then we
develop a three-stage neural network model for accurate past human pose
estimation. Comprehensive experiments show that thermal cues significantly
reduce the ambiguities of this task, and the proposed model achieves remarkable
performance. The dataset is available at
https://github.com/ZitianTang/Thermal-IM.
Related papers
- Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis [11.793425521298488]
This paper introduces a physics-induced 3D Gaussian splatting method named Thermal3D-GS.
The first large-scale benchmark dataset for this field named Thermal Infrared Novel-view Synthesis dataset (TI-NSD) is created.
The results indicate that our method outperforms the baseline method with a 3.03 dB improvement in PSNR.
arXiv Detail & Related papers (2024-09-12T13:46:53Z) - ThermalGaussian: Thermal 3D Gaussian Splatting [25.536611434289647]
We propose ThermalGaussian, the first thermal 3DGS approach capable of rendering high-quality images in RGB and thermal modalities.
We conduct comprehensive experiments to show that ThermalGaussian achieves photorealistic rendering of thermal images and improves the rendering quality of RGB images.
arXiv Detail & Related papers (2024-09-11T11:45:57Z) - T-FAKE: Synthesizing Thermal Images for Facial Landmarking [8.20594611891252]
We introduce the T-FAKE dataset, a new large-scale synthetic thermal dataset with sparse and dense landmarks.
Our models show excellent performance with both sparse 70-point landmarks and dense 478-point landmark annotations.
arXiv Detail & Related papers (2024-08-27T15:07:58Z) - CattleFace-RGBT: RGB-T Cattle Facial Landmark Benchmark [4.463254896517738]
CattleFace-RGBT is a RGB-T Cattle Facial Landmark dataset consisting of 2,300 RGB-T image pairs, a total of 4,600 images.
Applying AI to thermal images is challenging due to suboptimal results from direct thermal training and infeasible RGB-thermal alignment.
We transfer models trained on RGB to thermal images and refine them using our AI-assisted annotation tool.
arXiv Detail & Related papers (2024-06-05T16:29:13Z) - ThermoNeRF: Multimodal Neural Radiance Fields for Thermal Novel View Synthesis [5.66229031510643]
We propose ThermoNeRF, a novel approach to rendering new RGB and thermal views of a scene jointly.
To overcome the lack of texture in thermal images, we use paired RGB and thermal images to learn scene density.
We also introduce ThermoScenes, a new dataset to palliate the lack of available RGB+thermal datasets for scene reconstruction.
arXiv Detail & Related papers (2024-03-18T18:10:34Z) - Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh
Reconstruction [66.10717041384625]
Zolly is the first 3DHMR method focusing on perspective-distorted images.
We propose a new camera model and a novel 2D representation, termed distortion image, which describes the 2D dense distortion scale of the human body.
We extend two real-world datasets tailored for this task, all containing perspective-distorted human images.
arXiv Detail & Related papers (2023-03-24T04:22:41Z) - Does Thermal Really Always Matter for RGB-T Salient Object Detection? [153.17156598262656]
This paper proposes a network named TNet to solve the RGB-T salient object detection (SOD) task.
In this paper, we introduce a global illumination estimation module to predict the global illuminance score of the image.
On the other hand, we introduce a two-stage localization and complementation module in the decoding phase to transfer object localization cue and internal integrity cue in thermal features to the RGB modality.
arXiv Detail & Related papers (2022-10-09T13:50:12Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z) - Real-time RGBD-based Extended Body Pose Estimation [57.61868412206493]
We present a system for real-time RGBD-based estimation of 3D human pose.
We use parametric 3D deformable human mesh model (SMPL-X) as a representation.
We train estimators of body pose and facial expression parameters.
arXiv Detail & Related papers (2021-03-05T13:37:50Z) - A Large-Scale, Time-Synchronized Visible and Thermal Face Dataset [62.193924313292875]
We present the DEVCOM Army Research Laboratory Visible-Thermal Face dataset (ARL-VTF)
With over 500,000 images from 395 subjects, the ARL-VTF dataset represents to the best of our knowledge, the largest collection of paired visible and thermal face images to date.
This paper presents benchmark results and analysis on thermal face landmark detection and thermal-to-visible face verification by evaluating state-of-the-art models on the ARL-VTF dataset.
arXiv Detail & Related papers (2021-01-07T17:17:12Z) - I2L-MeshNet: Image-to-Lixel Prediction Network for Accurate 3D Human
Pose and Mesh Estimation from a Single RGB Image [79.040930290399]
We propose I2L-MeshNet, an image-to-lixel (line+pixel) prediction network.
The proposed I2L-MeshNet predicts the per-lixel likelihood on 1D heatmaps for each mesh coordinate instead of directly regressing the parameters.
Our lixel-based 1D heatmap preserves the spatial relationship in the input image and models the prediction uncertainty.
arXiv Detail & Related papers (2020-08-09T12:13:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.