Typing on Any Surface: A Deep Learning-based Method for Real-Time
Keystroke Detection in Augmented Reality
- URL: http://arxiv.org/abs/2309.00174v2
- Date: Thu, 2 Nov 2023 05:24:38 GMT
- Title: Typing on Any Surface: A Deep Learning-based Method for Real-Time
Keystroke Detection in Augmented Reality
- Authors: Xingyu Fu and Mingze Xi
- Abstract summary: Mid-air keyboard interface, wireless keyboards or voice input, either suffer from poor ergonomic design, limited accuracy, or are simply embarrassing to use in public.
This paper proposes and validates a deep-learning based approach, that enables AR applications to accurately predict keystrokes from the user perspective RGB video stream.
A two-stage model, combing an off-the-shelf hand landmark extractor and a novel adaptive Convolutional Recurrent Neural Network (C-RNN) was trained.
- Score: 4.857109990499532
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Frustrating text entry interface has been a major obstacle in participating
in social activities in augmented reality (AR). Popular options, such as
mid-air keyboard interface, wireless keyboards or voice input, either suffer
from poor ergonomic design, limited accuracy, or are simply embarrassing to use
in public. This paper proposes and validates a deep-learning based approach,
that enables AR applications to accurately predict keystrokes from the user
perspective RGB video stream that can be captured by any AR headset. This
enables a user to perform typing activities on any flat surface and eliminates
the need of a physical or virtual keyboard. A two-stage model, combing an
off-the-shelf hand landmark extractor and a novel adaptive Convolutional
Recurrent Neural Network (C-RNN), was trained using our newly built dataset.
The final model was capable of adaptive processing user-perspective video
streams at ~32 FPS. This base model achieved an overall accuracy of $91.05\%$
when typing 40 Words per Minute (wpm), which is how fast an average person
types with two hands on a physical keyboard. The Normalised Levenshtein
Distance also further confirmed the real-world applicability of that our
approach. The promising results highlight the viability of our approach and the
potential for our method to be integrated into various applications. We also
discussed the limitations and future research required to bring such technique
into a production system.
Related papers
- TapType: Ten-finger text entry on everyday surfaces via Bayesian inference [32.33746932895968]
TapType is a mobile text entry system for full-size typing on passive surfaces.
From the inertial sensors inside a band on either wrist, TapType decodes and relates surface taps to a traditional QWERTY keyboard layout.
arXiv Detail & Related papers (2024-10-08T12:58:31Z) - Sparse Binarization for Fast Keyword Spotting [10.964148450512972]
KWS models can be deployed on edge devices for real-time applications, privacy, and bandwidth efficiency.
We propose a novel keyword-spotting model based on sparse input representation followed by a linear classifier.
Our method is also more robust in noisy environments while being fast.
arXiv Detail & Related papers (2024-06-09T08:03:48Z) - Early Action Recognition with Action Prototypes [62.826125870298306]
We propose a novel model that learns a prototypical representation of the full action for each class.
We decompose the video into short clips, where a visual encoder extracts features from each clip independently.
Later, a decoder aggregates together in an online fashion features from all the clips for the final class prediction.
arXiv Detail & Related papers (2023-12-11T18:31:13Z) - Generative Input: Towards Next-Generation Input Methods Paradigm [49.98958865125018]
We propose a novel Generative Input paradigm named GeneInput.
It uses prompts to handle all input scenarios and other intelligent auxiliary input functions, optimizing the model with user feedback to deliver personalized results.
The results demonstrate that we have achieved state-of-the-art performance for the first time in the Full-mode Key-sequence to Characters(FK2C) task.
arXiv Detail & Related papers (2023-11-02T12:01:29Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - Teachable Reality: Prototyping Tangible Augmented Reality with Everyday
Objects by Leveraging Interactive Machine Teaching [4.019017835137353]
Teachable Reality is an augmented reality (AR) prototyping tool for creating interactive tangible AR applications with arbitrary everyday objects.
It identifies the user-defined tangible and gestural interactions using an on-demand computer vision model.
Our approach can lower the barrier to creating functional AR prototypes while also allowing flexible and general-purpose prototyping experiences.
arXiv Detail & Related papers (2023-02-21T23:03:49Z) - Muscle Vision: Real Time Keypoint Based Pose Classification of Physical
Exercises [52.77024349608834]
3D human pose recognition extrapolated from video has advanced to the point of enabling real-time software applications.
We propose a new machine learning pipeline and web interface that performs human pose recognition on a live video feed to detect when common exercises are performed and classify them accordingly.
arXiv Detail & Related papers (2022-03-23T00:55:07Z) - X2T: Training an X-to-Text Typing Interface with Online Learning from
User Feedback [83.95599156217945]
We focus on assistive typing applications in which a user cannot operate a keyboard, but can supply other inputs.
Standard methods train a model on a fixed dataset of user inputs, then deploy a static interface that does not learn from its mistakes.
We investigate a simple idea that would enable such interfaces to improve over time, with minimal additional effort from the user.
arXiv Detail & Related papers (2022-03-04T00:07:20Z) - SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild [62.450907796261646]
Recognition of hand gestures can be performed directly from the stream of hand skeletons estimated by software.
Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario.
This paper presents the results of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild contest.
arXiv Detail & Related papers (2021-06-21T10:57:49Z) - Few-Shot Visual Grounding for Natural Human-Robot Interaction [0.0]
We propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user.
At the core of our system, we employ a multi-modal deep neural network for visual grounding.
We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets.
arXiv Detail & Related papers (2021-03-17T15:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.