DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding
- URL: http://arxiv.org/abs/2506.13897v3
- Date: Wed, 25 Jun 2025 19:20:54 GMT
- Title: DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding
- Authors: Thomas Kreutz, Max Mühlhäuser, Alejandro Sanchez Guinea,
- Abstract summary: DeSPITE is a Deep Skeleton-Pointcloud-IMU-Text Embedding model.<n>We show that DeSPITE is an effective pre-training strategy for point cloud HAR through experiments in MSR-Action3D and HMPEAR.
- Score: 65.72663487116439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite LiDAR (Light Detection and Ranging) being an effective privacy-preserving alternative to RGB cameras to perceive human activities, it remains largely underexplored in the context of multi-modal contrastive pre-training for human activity understanding (e.g., human activity recognition (HAR), retrieval, or person re-identification (RE-ID)). To close this gap, our work explores learning the correspondence between LiDAR point clouds, human skeleton poses, IMU data, and text in a joint embedding space. More specifically, we present DeSPITE, a Deep Skeleton-Pointcloud-IMU-Text Embedding model, which effectively learns a joint embedding space across these four modalities. At the heart of our empirical exploration, we have combined the existing LIPD and Babel datasets, which enabled us to synchronize data of all four modalities, allowing us to explore the learning of a new joint embedding space. Our experiments demonstrate novel human activity understanding tasks for point cloud sequences enabled through DeSPITE, including Skeleton<->Pointcloud<->IMU matching, retrieval, and temporal moment retrieval. Furthermore, we show that DeSPITE is an effective pre-training strategy for point cloud HAR through experiments in MSR-Action3D and HMPEAR.
Related papers
- Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs [9.570759294459629]
We propose Multi$3$Net, our novel multi-modal, multitask, and contrastive-based framework approach to address the issue of limited data.
Our method seeks to enhance wearable HAR performance, especially in recognizing subtle activities.
arXiv Detail & Related papers (2024-06-03T13:28:42Z) - Language-Assisted 3D Scene Understanding [17.663583203177197]
We propose a language-assisted approach to point cloud feature learning (LAST-PCL)
We achieve de-redundancy and feature dimensionality reduction without compromising textual priors.
The proposed method learns semantically meaningful point cloud features and achieves state-of-the-art or comparable performance in 3D semantic segmentation, 3D object detection, and 3D scene classification tasks.
arXiv Detail & Related papers (2023-12-18T18:54:56Z) - Improving Video Violence Recognition with Human Interaction Learning on
3D Skeleton Point Clouds [88.87985219999764]
We develop a method for video violence recognition from a new perspective of skeleton points.
We first formulate 3D skeleton point clouds from human sequences extracted from videos.
We then perform interaction learning on these 3D skeleton point clouds.
arXiv Detail & Related papers (2023-08-26T12:55:18Z) - Explore In-Context Learning for 3D Point Cloud Understanding [71.20912026561484]
We introduce a novel framework, named Point-In-Context, designed especially for in-context learning in 3D point clouds.
We propose the Joint Sampling module, carefully designed to work in tandem with the general point sampling operator.
We conduct extensive experiments to validate the versatility and adaptability of our proposed methods in handling a wide range of tasks.
arXiv Detail & Related papers (2023-06-14T17:53:21Z) - CLR-GAM: Contrastive Point Cloud Learning with Guided Augmentation and
Feature Mapping [12.679625717350113]
We present CLR-GAM, a contrastive learning-based framework with Guided Augmentation (GA) for efficient dynamic exploration strategy.
We empirically demonstrate that the proposed approach achieves state-of-the-art performance on both simulated and real-world 3D point cloud datasets.
arXiv Detail & Related papers (2023-02-28T04:38:52Z) - Open-Vocabulary 3D Detection via Image-level Class and Debiased
Cross-modal Contrastive Learning [62.18197846270103]
Current point-cloud detection methods have difficulty detecting the open-vocabulary objects in the real world.
We propose OV-3DETIC, an Open-Vocabulary 3D DETector using Image-level Class supervision.
arXiv Detail & Related papers (2022-07-05T12:13:52Z) - Learning-based Point Cloud Registration for 6D Object Pose Estimation in
the Real World [55.7340077183072]
We tackle the task of estimating the 6D pose of an object from point cloud data.
Recent learning-based approaches to addressing this task have shown great success on synthetic datasets.
We analyze the causes of these failures, which we trace back to the difference between the feature distributions of the source and target point clouds.
arXiv Detail & Related papers (2022-03-29T07:55:04Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - A Deep Learning Method for Complex Human Activity Recognition Using
Virtual Wearable Sensors [22.923108537119685]
Sensor-based human activity recognition (HAR) is now a research hotspot in multiple application areas.
We propose a novel method based on deep learning for complex HAR in the real-scene.
The proposed method can surprisingly converge in a few iterations and achieve an accuracy of 91.15% on a real IMU dataset.
arXiv Detail & Related papers (2020-03-04T03:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.