Closing the Gap in Human Behavior Analysis: A Pipeline for Synthesizing
Trimodal Data
- URL: http://arxiv.org/abs/2402.01537v1
- Date: Fri, 2 Feb 2024 16:27:45 GMT
- Title: Closing the Gap in Human Behavior Analysis: A Pipeline for Synthesizing
Trimodal Data
- Authors: Christian Stippel, Thomas Heitzinger, Rafael Sterzinger, Martin Kampel
- Abstract summary: We introduce a novel generative technique for creating trimodal, i.e., RGB, thermal, and depth, human-focused datasets.
This technique capitalizes on human segmentation masks derived from RGB images, combined with thermal and depth backgrounds that are sourced automatically.
By employing this approach, we generate trimodal data that can be leveraged to train models for settings with limited data, bad lightning conditions, or privacy-sensitive areas.
- Score: 1.8024397171920885
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In pervasive machine learning, especially in Human Behavior Analysis (HBA),
RGB has been the primary modality due to its accessibility and richness of
information. However, linked with its benefits are challenges, including
sensitivity to lighting conditions and privacy concerns. One possibility to
overcome these vulnerabilities is to resort to different modalities. For
instance, thermal is particularly adept at accentuating human forms, while
depth adds crucial contextual layers. Despite their known benefits, only a few
HBA-specific datasets that integrate these modalities exist. To address this
shortage, our research introduces a novel generative technique for creating
trimodal, i.e., RGB, thermal, and depth, human-focused datasets. This technique
capitalizes on human segmentation masks derived from RGB images, combined with
thermal and depth backgrounds that are sourced automatically. With these two
ingredients, we synthesize depth and thermal counterparts from existing RGB
data utilizing conditional image-to-image translation. By employing this
approach, we generate trimodal data that can be leveraged to train models for
settings with limited data, bad lightning conditions, or privacy-sensitive
areas.
Related papers
- Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB [48.31210455404533]
Heatmap-based 3D pose estimator is able to hallucinate depth information from the RGB frames given at inference time.
depth information is used exclusively during training by enforcing our RGB-based hallucination network to learn similar features to a backbone pre-trained only on depth data.
arXiv Detail & Related papers (2024-09-17T11:59:34Z) - T-FAKE: Synthesizing Thermal Images for Facial Landmarking [8.20594611891252]
We introduce the T-FAKE dataset, a new large-scale synthetic thermal dataset with sparse and dense landmarks.
Our models show excellent performance with both sparse 70-point landmarks and dense 478-point landmark annotations.
arXiv Detail & Related papers (2024-08-27T15:07:58Z) - Ternary-Type Opacity and Hybrid Odometry for RGB NeRF-SLAM [58.736472371951955]
We introduce a ternary-type opacity (TT) model, which categorizes points on a ray intersecting a surface into three regions: before, on, and behind the surface.
This enables a more accurate rendering of depth, subsequently improving the performance of image warping techniques.
Our integrated approach of TT and HO achieves state-of-the-art performance on synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-20T18:03:17Z) - Attentive Multimodal Fusion for Optical and Scene Flow [24.08052492109655]
Existing methods typically rely solely on RGB images or fuse the modalities at later stages.
We propose a novel deep neural network approach named FusionRAFT, which enables early-stage information fusion between sensor modalities.
Our approach exhibits improved robustness in the presence of noise and low-lighting conditions that affect the RGB images.
arXiv Detail & Related papers (2023-07-28T04:36:07Z) - Symmetric Uncertainty-Aware Feature Transmission for Depth
Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR.
Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z) - What Happened 3 Seconds Ago? Inferring the Past with Thermal Imaging [22.923237551192834]
We collect the first RGB-Thermal dataset for human motion analysis, dubbed Thermal-IM.
We develop a three-stage neural network model for accurate past human pose estimation.
arXiv Detail & Related papers (2023-04-26T16:23:10Z) - Consistent Depth Prediction under Various Illuminations using Dilated
Cross Attention [1.332560004325655]
We propose to use internet 3D indoor scenes and manually tune their illuminations to render photo-realistic RGB photos and their corresponding depth and BRDF maps.
We perform cross attention on these dilated features to retain the consistency of depth prediction under different illuminations.
Our method is evaluated by comparing it with current state-of-the-art methods on Vari dataset and a significant improvement is observed in experiments.
arXiv Detail & Related papers (2021-12-15T10:02:46Z) - Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer
RGBD Sensors [67.88097893304274]
We propose a human volumetric capture method that combines temporal fusion and deep implicit functions.
We propose dynamic sliding to fuse depth observations together with topology consistency.
arXiv Detail & Related papers (2021-05-05T04:12:38Z) - Learning Selective Mutual Attention and Contrast for RGB-D Saliency
Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection.
Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods.
We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z) - 3D Dense Geometry-Guided Facial Expression Synthesis by Adversarial
Learning [54.24887282693925]
We propose a novel framework to exploit 3D dense (depth and surface normals) information for expression manipulation.
We use an off-the-shelf state-of-the-art 3D reconstruction model to estimate the depth and create a large-scale RGB-Depth dataset.
Our experiments demonstrate that the proposed method outperforms the competitive baseline and existing arts by a large margin.
arXiv Detail & Related papers (2020-09-30T17:12:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.