RingGesture: A Ring-Based Mid-Air Gesture Typing System Powered by a Deep-Learning Word Prediction Framework
- URL: http://arxiv.org/abs/2410.18100v1
- Date: Tue, 08 Oct 2024 13:15:30 GMT
- Title: RingGesture: A Ring-Based Mid-Air Gesture Typing System Powered by a Deep-Learning Word Prediction Framework
- Authors: Junxiao Shen, Roger Boldu, Arpit Kalla, Michael Glueck, Hemant Bhaskar Surale Amy Karlson,
- Abstract summary: RingGesture is a ring-based mid-air gesture typing technique utilizing electrodes to mark the start and end of gesture trajectories.
We propose a novel deep-learning word prediction framework, Score Fusion, comprised of three key components.
RingGesture achieves an average text entry speed of 27.3 words per minute (WPM) and a peak performance of 47.9 WPM.
- Score: 2.4992122541451987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text entry is a critical capability for any modern computing experience, with lightweight augmented reality (AR) glasses being no exception. Designed for all-day wearability, a limitation of lightweight AR glass is the restriction to the inclusion of multiple cameras for extensive field of view in hand tracking. This constraint underscores the need for an additional input device. We propose a system to address this gap: a ring-based mid-air gesture typing technique, RingGesture, utilizing electrodes to mark the start and end of gesture trajectories and inertial measurement units (IMU) sensors for hand tracking. This method offers an intuitive experience similar to raycast-based mid-air gesture typing found in VR headsets, allowing for a seamless translation of hand movements into cursor navigation. To enhance both accuracy and input speed, we propose a novel deep-learning word prediction framework, Score Fusion, comprised of three key components: a) a word-gesture decoding model, b) a spatial spelling correction model, and c) a lightweight contextual language model. In contrast, this framework fuses the scores from the three models to predict the most likely words with higher precision. We conduct comparative and longitudinal studies to demonstrate two key findings: firstly, the overall effectiveness of RingGesture, which achieves an average text entry speed of 27.3 words per minute (WPM) and a peak performance of 47.9 WPM. Secondly, we highlight the superior performance of the Score Fusion framework, which offers a 28.2% improvement in uncorrected Character Error Rate over a conventional word prediction framework, Naive Correction, leading to a 55.2% improvement in text entry speed for RingGesture. Additionally, RingGesture received a System Usability Score of 83 signifying its excellent usability.
Related papers
- LEADER: Lightweight End-to-End Attention-Gated Dual Autoencoder for Robust Minutiae Extraction [0.05978532290288763]
This paper introduces LEADER (Lightweight End-to-end Attention-gated Dual autoencodER), a neural network that maps raw fingerprint images to minutiae descriptors.<n>It employs a novel "Castle-Moat-Rampart" ground-truth encoding and a dual-autoencoder structure, interconnected through an attention-gating mechanism.<n>It attains a 34% higher F1-score on the NIST SD27 dataset compared to specialized latent minutiae extractors.
arXiv Detail & Related papers (2026-02-17T11:02:28Z) - Boosting Point-supervised Temporal Action Localization via Text Refinement and Alignment [66.80402022104074]
We propose a Text Refinement and Alignment (TRA) framework that effectively utilizes textual features from visual descriptions to complement the visual features as they are semantically rich.<n>This is achieved by designing two new modules for the original point-supervised framework: a Point-based Text Refinement module (PTR) and a Point-based Multimodal Alignment module (PMA)
arXiv Detail & Related papers (2026-02-01T14:35:46Z) - Dual-Granularity Semantic Prompting for Language Guidance Infrared Small Target Detection [102.1314414263959]
Infrared small target detection remains challenging due to limited feature representation and severe background interference.<n>We propose DGSPNet, an end-to-end language prompt-driven framework.<n>Our method significantly improves detection accuracy and achieves state-of-the-art performance on three benchmark datasets.
arXiv Detail & Related papers (2025-11-24T16:58:23Z) - HandReader: Advanced Techniques for Efficient Fingerspelling Recognition [75.38606213726906]
This paper introduces HandReader, a group of three architectures designed to address the fingerspelling recognition task.<n>HandReader$_RGB$ employs the novel Adaptive Shift-Temporal Module (TSAM) to process RGB features from videos of varying lengths.<n>HandReader$_KP$ is built on the proposed Temporal Pose (TPE) operated on keypoints as tensors.<n>Each HandReader model possesses distinct advantages and achieves state-of-the-art results on the ChicagoFSWild and ChicagoFSWild+ datasets.
arXiv Detail & Related papers (2025-05-15T13:18:37Z) - Video Anomaly Detection with Structured Keywords [0.0]
This paper focuses on detecting anomalies in surveillance video using keywords by leveraging foundational models' feature representation generalization capabilities.
We present a novel, lightweight pipeline for anomaly classification using keyword weights.
We achieve comparable performance on the three benchmarks Ped2, Shanghai Tech, and CUHK Avenue, with ROC AUC scores of 0.865, 0.745, and 0.742, respectively.
arXiv Detail & Related papers (2025-03-07T20:05:59Z) - T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting [20.21019748095159]
Zero-shot object counting aims to count instances of arbitrary object categories specified by text descriptions.
We present T2ICount, a diffusion-based framework that leverages rich prior knowledge and fine-grained visual understanding from pretrained diffusion models.
arXiv Detail & Related papers (2025-02-28T01:09:18Z) - ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - RT-OVAD: Real-Time Open-Vocabulary Aerial Object Detection via Image-Text Collaboration [12.66046875297631]
We propose RT-OVAD, the first real-time open-vocabulary detector for aerial scenes.<n>We introduce an image-to-text alignment loss to replace the conventional category regression loss.<n>We also propose a lightweight image-text collaboration strategy comprising an image-text collaboration encoder and a text-guided decoder.
arXiv Detail & Related papers (2024-08-22T09:33:25Z) - Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation [82.95830628372845]
This paper introduces a collaborative vision-text optimizing mechanism within the Open-Vocabulary encoder (OVS) field.
To the best of our knowledge, we are the first to establish the collaborative vision-text optimizing mechanism within the OVS field.
In open-vocabulary semantic segmentation, our method outperforms the previous state-of-the-art approaches by +0.5, +2.3, +3.4, +0.4 and +1.1 mIoU, respectively.
arXiv Detail & Related papers (2024-08-01T17:48:08Z) - EvSign: Sign Language Recognition and Translation with Streaming Events [59.51655336911345]
Event camera could naturally perceive dynamic hand movements, providing rich manual clues for sign language tasks.
We propose efficient transformer-based framework for event-based SLR and SLT tasks.
Our method performs favorably against existing state-of-the-art approaches with only 0.34% computational cost.
arXiv Detail & Related papers (2024-07-17T14:16:35Z) - PenSLR: Persian end-to-end Sign Language Recognition Using Ensembling [0.953605234706973]
Pen SLR is a glove-based sign language system consisting of an Inertial Measurement Unit (IMU) and five flexible sensors powered by a deep learning framework.
We propose a novel ensembling technique by leveraging a multiple sequence alignment algorithm known as Star Alignment.
Our evaluations show that Pen SLR achieves a remarkable word accuracy of 94.58% and 96.70% in subject-independent and subject-dependent setups.
arXiv Detail & Related papers (2024-06-24T07:59:34Z) - Motor Focus: Fast Ego-Motion Prediction for Assistive Visual Navigation [3.837186701755568]
Motor Focus is an image-based framework that predicts the observer's motion direction based on their visual feeds.
Our framework demonstrates its superiority in speed (> 40FPS), accuracy (MAE = 60pixels), and robustness (SNR = 23dB)
arXiv Detail & Related papers (2024-04-25T20:45:39Z) - Typing on Any Surface: A Deep Learning-based Method for Real-Time
Keystroke Detection in Augmented Reality [4.857109990499532]
Mid-air keyboard interface, wireless keyboards or voice input, either suffer from poor ergonomic design, limited accuracy, or are simply embarrassing to use in public.
This paper proposes and validates a deep-learning based approach, that enables AR applications to accurately predict keystrokes from the user perspective RGB video stream.
A two-stage model, combing an off-the-shelf hand landmark extractor and a novel adaptive Convolutional Recurrent Neural Network (C-RNN) was trained.
arXiv Detail & Related papers (2023-08-31T23:58:25Z) - Towards Robust Real-Time Scene Text Detection: From Semantic to Instance
Representation Learning [19.856492291263102]
We propose representation learning for real-time scene text detection.
For semantic representation learning, we propose global-dense semantic contrast (GDSC) and top-down modeling (TDM)
With the proposed GDSC and TDM, the encoder network learns stronger representation without introducing any parameters and computations during inference.
The proposed method achieves 87.2% F-measure with 48.2 FPS on Total-Text and 89.6% F-measure with 36.9 FPS on MSRA-TD500.
arXiv Detail & Related papers (2023-08-14T15:14:37Z) - Three ways to improve feature alignment for open vocabulary detection [88.65076922242184]
Key problem in zero-shot open vocabulary detection is how to align visual and text features, so that the detector performs well on unseen classes.
Previous approaches train the feature pyramid and detection head from scratch, which breaks the vision-text feature alignment established during pretraining.
We propose three methods to alleviate these issues. Firstly, a simple scheme is used to augment the text embeddings which prevents overfitting to a small number of classes seen during training.
Secondly, the feature pyramid network and the detection head are modified to include trainable shortcuts.
Finally, a self-training approach is used to leverage a larger corpus of
arXiv Detail & Related papers (2023-03-23T17:59:53Z) - Learning to Decompose Visual Features with Latent Textual Prompts [140.2117637223449]
We propose Decomposed Feature Prompting (DeFo) to improve vision-language models.
Our empirical study shows DeFo's significance in improving the vision-language models.
arXiv Detail & Related papers (2022-10-09T15:40:13Z) - PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering
Network [54.03560668182197]
We propose a novel fully convolutional Point Gathering Network (PGNet) for reading arbitrarily-shaped text in real-time.
With a PG-CTC decoder, we gather high-level character classification vectors from two-dimensional space and decode them into text symbols without NMS and RoI operations.
Experiments prove that the proposed method achieves competitive accuracy, meanwhile significantly improving the running speed.
arXiv Detail & Related papers (2021-04-12T13:27:34Z) - FastHand: Fast Hand Pose Estimation From A Monocular Camera [12.790733588554588]
We propose a fast and accurate framework for hand pose estimation, dubbed as "FastHand"
FastHand offers high accuracy scores while reaching a speed of 25 frames per second on an NVIDIA Jetson TX2 graphics processing unit.
arXiv Detail & Related papers (2021-02-14T04:12:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.