Comparison of Data Representations and Machine Learning Architectures
for User Identification on Arbitrary Motion Sequences
- URL: http://arxiv.org/abs/2210.00527v1
- Date: Sun, 2 Oct 2022 14:12:10 GMT
- Title: Comparison of Data Representations and Machine Learning Architectures
for User Identification on Arbitrary Motion Sequences
- Authors: Christian Schell, Andreas Hotho, Marc Erich Latoschik
- Abstract summary: This paper compares different machine learning approaches to identify users based on arbitrary sequences of head and hand movements.
We publish all our code to allow and to provide baselines for future work.
The model correctly identifies any of the 34 subjects with an accuracy of 100% within 150 seconds.
- Score: 8.967985264567217
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reliable and robust user identification and authentication are important and
often necessary requirements for many digital services. It becomes paramount in
social virtual reality (VR) to ensure trust, specifically in digital encounters
with lifelike realistic-looking avatars as faithful replications of real
persons. Recent research has shown great interest in providing new solutions
that verify the identity of extended reality (XR) systems. This paper compares
different machine learning approaches to identify users based on arbitrary
sequences of head and hand movements, a data stream provided by the majority of
today's XR systems. We compare three different potential representations of the
motion data from heads and hands (scene-relative, body-relative, and
body-relative velocities), and by comparing the performances of five different
machine learning architectures (random forest, multilayer perceptron, fully
recurrent neural network, long-short term memory, gated recurrent unit). We use
the publicly available dataset "Talking with Hands" and publish all our code to
allow reproducibility and to provide baselines for future work. After
hyperparameter optimization, the combination of a long-short term memory
architecture and body-relative data outperformed competing combinations: the
model correctly identifies any of the 34 subjects with an accuracy of 100\%
within 150 seconds. The code for models, training and evaluation is made
publicly available. Altogether, our approach provides an effective foundation
for behaviometric-based identification and authentication to guide researchers
and practitioners.
Related papers
- SDFR: Synthetic Data for Face Recognition Competition [51.9134406629509]
Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns.
Recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets.
This paper presents the summary of the Synthetic Data for Face Recognition (SDFR) Competition held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024)
The SDFR competition was split into two tasks, allowing participants to train face recognition systems using new synthetic datasets and/or existing ones.
arXiv Detail & Related papers (2024-04-06T10:30:31Z) - Efficient Gesture Recognition on Spiking Convolutional Networks Through
Sensor Fusion of Event-Based and Depth Data [1.474723404975345]
This work proposes a Spiking Convolutional Neural Network, processing event- and depth data for gesture recognition.
The network is simulated using the open-source neuromorphic computing framework LAVA for offline training and evaluation on an embedded system.
arXiv Detail & Related papers (2024-01-30T14:42:35Z) - DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity
Human-centric Rendering [126.00165445599764]
We present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering.
Our dataset contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume.
We construct a professional multi-view system to capture data, which contains 60 synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern camera calibration steps.
arXiv Detail & Related papers (2023-07-19T17:58:03Z) - Dialogue-Contextualized Re-ranking for Medical History-Taking [5.039849340960835]
We present a two-stage re-ranking approach that helps close the training-inference gap by re-ranking the first-stage question candidates.
We find that relative to the expert system, the best performance is achieved by our proposed global re-ranker with a transformer backbone.
arXiv Detail & Related papers (2023-04-04T17:31:32Z) - Versatile User Identification in Extended Reality using Pretrained Similarity-Learning [16.356961801884562]
We develop a similarity-learning model and pretrained it on the "Who Is Alyx?" dataset.
In comparison with a traditional classification-learning baseline, our model shows superior performance.
Our approach paves the way for easy integration of pretrained motion-based identification models in production XR systems.
arXiv Detail & Related papers (2023-02-15T08:26:24Z) - NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks.
Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth.
Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z) - A Comparative Study of Data Augmentation Techniques for Deep Learning
Based Emotion Recognition [11.928873764689458]
We conduct a comprehensive evaluation of popular deep learning approaches for emotion recognition.
We show that long-range dependencies in the speech signal are critical for emotion recognition.
Speed/rate augmentation offers the most robust performance gain across models.
arXiv Detail & Related papers (2022-11-09T17:27:03Z) - Facial Emotion Recognition using Deep Residual Networks in Real-World
Environments [5.834678345946704]
We propose a facial feature extractor model trained on an in-the-wild and massively collected video dataset.
The dataset consists of a million labelled frames and 2,616 thousand subjects.
As temporal information is important to the emotion recognition domain, we utilise LSTM cells to capture the temporal dynamics in the data.
arXiv Detail & Related papers (2021-11-04T10:08:22Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z) - ResNeSt: Split-Attention Networks [86.25490825631763]
We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations.
Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
arXiv Detail & Related papers (2020-04-19T20:40:31Z) - Deep Learning for Person Re-identification: A Survey and Outlook [233.36948173686602]
Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras.
By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings.
arXiv Detail & Related papers (2020-01-13T12:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.