A Spatio-Temporal Multilayer Perceptron for Gesture Recognition
        - URL: http://arxiv.org/abs/2204.11511v1
 - Date: Mon, 25 Apr 2022 08:42:47 GMT
 - Title: A Spatio-Temporal Multilayer Perceptron for Gesture Recognition
 - Authors: Adrian Holzbock, Alexander Tsaregorodtsev, Youssef Dawoud, Klaus
  Dietmayer, Vasileios Belagiannis
 - Abstract summary: We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles.
An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach.
We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
 - Score: 70.34489104710366
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   Gesture recognition is essential for the interaction of autonomous vehicles
with humans. While the current approaches focus on combining several modalities
like image features, keypoints and bone vectors, we present neural network
architecture that delivers state-of-the-art results only with body skeleton
input data. We propose the spatio-temporal multilayer perceptron for gesture
recognition in the context of autonomous vehicles. Given 3D body poses over
time, we define temporal and spatial mixing operations to extract features in
both domains. Additionally, the importance of each time step is re-weighted
with Squeeze-and-Excitation layers. An extensive evaluation of the TCG and
Drive&Act datasets is provided to showcase the promising performance of our
approach. Furthermore, we deploy our model to our autonomous vehicle to show
its real-time capability and stable execution.
 
       
      
        Related papers
        - Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation
  Learning of Vision-based Autonomous Driving [73.3702076688159]
We propose a novel contrastive learning algorithm, Cohere3D, to learn coherent instance representations in a long-term input sequence.
We evaluate our algorithm by finetuning the pretrained model on various downstream perception, prediction, and planning tasks.
arXiv  Detail & Related papers  (2024-02-23T19:43:01Z) - Trajeglish: Traffic Modeling as Next-Token Prediction [67.28197954427638]
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs.
We apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios.
Our model tops the Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%.
arXiv  Detail & Related papers  (2023-12-07T18:53:27Z) - Gesture Recognition with Keypoint and Radar Stream Fusion for Automated
  Vehicles [13.652770928249447]
We present a joint camera and radar approach to enable autonomous vehicles to understand and react to human gestures in everyday traffic.
We propose a fusion neural network for both modalities, including an auxiliary loss for each modality.
Motivated by adverse weather conditions, we also demonstrate promising performance when one of the sensors lacks functionality.
arXiv  Detail & Related papers  (2023-02-20T14:18:11Z) - ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal
  Feature Learning [132.20119288212376]
We propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.
To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system.
arXiv  Detail & Related papers  (2022-07-15T16:57:43Z) - Spatio-Temporal Self-Attention Network for Video Saliency Prediction [13.873682190242365]
3D convolutional neural networks have achieved promising results for video tasks in computer vision.
We propose a novel Spatio-Temporal Self-Temporal Self-Attention 3 Network (STSANet) for video saliency prediction.
arXiv  Detail & Related papers  (2021-08-24T12:52:47Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv  Detail & Related papers  (2021-04-08T20:01:00Z) - OpenPifPaf: Composite Fields for Semantic Keypoint Detection and
  Spatio-Temporal Association [90.39247595214998]
Image-based perception tasks can be formulated as detecting, associating and semantic keypoints, e.g. human body pose estimation and tracking.
We present a general framework that jointly detects semantic andtemporal keypoint associations in a single stage.
We also show that our method generalizes to any class of keypoints such as car and animal parts to provide a holistic perception framework.
arXiv  Detail & Related papers  (2021-03-03T14:44:14Z) - Attention-Driven Body Pose Encoding for Human Activity Recognition [0.0]
This article proposes a novel attention-based body pose encoding for human activity recognition.
The enriched data complements the 3D body joint position data and improves model performance.
arXiv  Detail & Related papers  (2020-09-29T22:17:17Z) - Gesture Recognition from Skeleton Data for Intuitive Human-Machine
  Interaction [0.6875312133832077]
We propose an approach for segmentation and classification of dynamic gestures based on a set of handcrafted features.
The method for gesture recognition applies a sliding window, which extracts information from both the spatial and temporal dimensions.
At the end, the recognized gestures are used to interact with a collaborative robot.
arXiv  Detail & Related papers  (2020-08-26T11:28:50Z) - A Graph Attention Spatio-temporal Convolutional Network for 3D Human
  Pose Estimation in Video [7.647599484103065]
We improve the learning of constraints in human skeleton by modeling local global spatial information via attention mechanisms.
Our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation.
arXiv  Detail & Related papers  (2020-03-11T14:54:40Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.