Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition
- URL: http://arxiv.org/abs/2506.18721v1
- Date: Mon, 23 Jun 2025 14:57:06 GMT
- Title: Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition
- Authors: Dustin Aganian, Erik Franze, Markus Eisenbach, Horst-Michael Gross,
- Abstract summary: We introduce a novel approach to skeleton-based action recognition that enriches input representations by leveraging word embeddings to encode semantic information.<n>Our method replaces one-hot encodings with semantic volumes, enabling the model to capture meaningful relationships between joints and objects.
- Score: 7.441242294426765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective human action recognition is widely used for cobots in Industry 4.0 to assist in assembly tasks. However, conventional skeleton-based methods often lose keypoint semantics, limiting their effectiveness in complex interactions. In this work, we introduce a novel approach to skeleton-based action recognition that enriches input representations by leveraging word embeddings to encode semantic information. Our method replaces one-hot encodings with semantic volumes, enabling the model to capture meaningful relationships between joints and objects. Through extensive experiments on multiple assembly datasets, we demonstrate that our approach significantly improves classification performance, and enhances generalization capabilities by simultaneously supporting different skeleton types and object classes. Our findings highlight the potential of incorporating semantic information to enhance skeleton-based action recognition in dynamic and diverse environments.
Related papers
- Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection [51.52749744031413]
Human-Object Interaction (HOI) detection aims to identify humans and objects within images and interpret their interactions.<n>Existing HOI methods rely heavily on large datasets with manual annotations to learn interactions from visual cues.<n>We propose a novel training-free HOI detection framework for Dynamic Scoring with enhanced semantics.
arXiv Detail & Related papers (2025-07-23T12:30:19Z) - Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning [58.73625654718187]
Generalized zero-shot learning aims to recognize both seen and unseen classes with the help of semantic information that is shared among different classes.<n>Existing approaches fine-tune the visual backbone by seen-class data to obtain semantic-related visual features.<n>This paper proposes a novel visual and semantic prompt collaboration framework, which utilizes prompt tuning techniques for efficient feature adaptation.
arXiv Detail & Related papers (2025-03-29T10:17:57Z) - Hierarchical Banzhaf Interaction for General Video-Language Representation Learning [60.44337740854767]
Multimodal representation learning plays an important role in the artificial intelligence domain.<n>We introduce a new approach that models video-text as game players using multivariate cooperative game theory.<n>We extend our original structure into a flexible encoder-decoder framework, enabling the model to adapt to various downstream tasks.
arXiv Detail & Related papers (2024-12-30T14:09:15Z) - An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Towards Zero-shot Human-Object Interaction Detection via Vision-Language
Integration [14.678931157058363]
We propose a novel framework, termed Knowledge Integration to HOI (KI2HOI), that effectively integrates the knowledge of visual-language model to improve zero-shot HOI detection.
We develop an effective additive self-attention mechanism to generate more comprehensive visual representations.
Our model outperforms the previous methods in various zero-shot and full-supervised settings.
arXiv Detail & Related papers (2024-03-12T02:07:23Z) - Multi-Semantic Fusion Model for Generalized Zero-Shot Skeleton-Based
Action Recognition [32.291333054680855]
Generalized zero-shot skeleton-based action recognition (GZSSAR) is a new challenging problem in computer vision community.
We propose a multi-semantic fusion (MSF) model for improving the performance of GZSSAR.
arXiv Detail & Related papers (2023-09-18T09:00:25Z) - Knowledge-Enhanced Hierarchical Information Correlation Learning for
Multi-Modal Rumor Detection [82.94413676131545]
We propose a novel knowledge-enhanced hierarchical information correlation learning approach (KhiCL) for multi-modal rumor detection.
KhiCL exploits cross-modal joint dictionary to transfer the heterogeneous unimodality features into the common feature space.
It extracts visual and textual entities from images and text, and designs a knowledge relevance reasoning strategy.
arXiv Detail & Related papers (2023-06-28T06:08:20Z) - How Object Information Improves Skeleton-based Human Action Recognition
in Assembly Tasks [12.349172146831506]
We present a novel approach of integrating object information into skeleton-based action recognition.
We enhance two state-of-the-art methods by treating object centers as further skeleton joints.
Our research sheds light on the benefits of combining skeleton joints with object information for human action recognition in assembly tasks.
arXiv Detail & Related papers (2023-06-09T12:18:14Z) - Knowledge Augmented Relation Inference for Group Activity Recognition [14.240856072486666]
We propose to exploit knowledge concretization for the group activity recognition.
We develop a novel Knowledge Augmented Relation Inference framework that can effectively use the concretized knowledge to improve the individual representations.
arXiv Detail & Related papers (2023-02-28T06:59:05Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Skeleton-Aware Networks for Deep Motion Retargeting [83.65593033474384]
We introduce a novel deep learning framework for data-driven motion between skeletons.
Our approach learns how to retarget without requiring any explicit pairing between the motions in the training set.
arXiv Detail & Related papers (2020-05-12T12:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.