Related papers: Closing the Gap in Human Behavior Analysis: A Pipeline for Synthesizing Trimodal Data

Closing the Gap in Human Behavior Analysis: A Pipeline for Synthesizing Trimodal Data

URL: http://arxiv.org/abs/2402.01537v1
Date: Fri, 2 Feb 2024 16:27:45 GMT
Title: Closing the Gap in Human Behavior Analysis: A Pipeline for Synthesizing Trimodal Data
Authors: Christian Stippel, Thomas Heitzinger, Rafael Sterzinger, Martin Kampel
Abstract summary: We introduce a novel generative technique for creating trimodal, i.e., RGB, thermal, and depth, human-focused datasets. This technique capitalizes on human segmentation masks derived from RGB images, combined with thermal and depth backgrounds that are sourced automatically. By employing this approach, we generate trimodal data that can be leveraged to train models for settings with limited data, bad lightning conditions, or privacy-sensitive areas.
Score: 1.8024397171920885
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In pervasive machine learning, especially in Human Behavior Analysis (HBA), RGB has been the primary modality due to its accessibility and richness of information. However, linked with its benefits are challenges, including sensitivity to lighting conditions and privacy concerns. One possibility to overcome these vulnerabilities is to resort to different modalities. For instance, thermal is particularly adept at accentuating human forms, while depth adds crucial contextual layers. Despite their known benefits, only a few HBA-specific datasets that integrate these modalities exist. To address this shortage, our research introduces a novel generative technique for creating trimodal, i.e., RGB, thermal, and depth, human-focused datasets. This technique capitalizes on human segmentation masks derived from RGB images, combined with thermal and depth backgrounds that are sourced automatically. With these two ingredients, we synthesize depth and thermal counterparts from existing RGB data utilizing conditional image-to-image translation. By employing this approach, we generate trimodal data that can be leveraged to train models for settings with limited data, bad lightning conditions, or privacy-sensitive areas.

Related papers

Human Activity Recognition using RGB-Event based Sensors: A Multi-modal Heat Conduction Model and A Benchmark Dataset [65.76480665062363]
Human Activity Recognition primarily relied on traditional RGB cameras to achieve high-performance activity recognition. Challenges in real-world scenarios, such as insufficient lighting and rapid movements, inevitably degrade the performance of RGB cameras. In this work, we rethink human activity recognition by combining the RGB and event cameras.
arXiv Detail & Related papers (2025-04-08T09:14:24Z)
Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB [48.31210455404533]
Heatmap-based 3D pose estimator is able to hallucinate depth information from the RGB frames given at inference time. depth information is used exclusively during training by enforcing our RGB-based hallucination network to learn similar features to a backbone pre-trained only on depth data.
arXiv Detail & Related papers (2024-09-17T11:59:34Z)
T-FAKE: Synthesizing Thermal Images for Facial Landmarking [8.20594611891252]
We introduce the T-FAKE dataset, a new large-scale synthetic thermal dataset with sparse and dense landmarks. Our models show excellent performance with both sparse 70-point landmarks and dense 478-point landmark annotations.
arXiv Detail & Related papers (2024-08-27T15:07:58Z)
Ternary-Type Opacity and Hybrid Odometry for RGB NeRF-SLAM [58.736472371951955]
We introduce a ternary-type opacity (TT) model, which categorizes points on a ray intersecting a surface into three regions: before, on, and behind the surface. This enables a more accurate rendering of depth, subsequently improving the performance of image warping techniques. Our integrated approach of TT and HO achieves state-of-the-art performance on synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-20T18:03:17Z)
Attentive Multimodal Fusion for Optical and Scene Flow [24.08052492109655]
Existing methods typically rely solely on RGB images or fuse the modalities at later stages. We propose a novel deep neural network approach named FusionRAFT, which enables early-stage information fusion between sensor modalities. Our approach exhibits improved robustness in the presence of noise and low-lighting conditions that affect the RGB images.
arXiv Detail & Related papers (2023-07-28T04:36:07Z)
Symmetric Uncertainty-Aware Feature Transmission for Depth Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR. Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z)
What Happened 3 Seconds Ago? Inferring the Past with Thermal Imaging [22.923237551192834]
We collect the first RGB-Thermal dataset for human motion analysis, dubbed Thermal-IM. We develop a three-stage neural network model for accurate past human pose estimation.
arXiv Detail & Related papers (2023-04-26T16:23:10Z)
Consistent Depth Prediction under Various Illuminations using Dilated Cross Attention [1.332560004325655]
We propose to use internet 3D indoor scenes and manually tune their illuminations to render photo-realistic RGB photos and their corresponding depth and BRDF maps. We perform cross attention on these dilated features to retain the consistency of depth prediction under different illuminations. Our method is evaluated by comparing it with current state-of-the-art methods on Vari dataset and a significant improvement is observed in experiments.
arXiv Detail & Related papers (2021-12-15T10:02:46Z)
Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors [67.88097893304274]
We propose a human volumetric capture method that combines temporal fusion and deep implicit functions. We propose dynamic sliding to fuse depth observations together with topology consistency.
arXiv Detail & Related papers (2021-05-05T04:12:38Z)
Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection. Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods. We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z)
3D Dense Geometry-Guided Facial Expression Synthesis by Adversarial Learning [54.24887282693925]
We propose a novel framework to exploit 3D dense (depth and surface normals) information for expression manipulation. We use an off-the-shelf state-of-the-art 3D reconstruction model to estimate the depth and create a large-scale RGB-Depth dataset. Our experiments demonstrate that the proposed method outperforms the competitive baseline and existing arts by a large margin.
arXiv Detail & Related papers (2020-09-30T17:12:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.