Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-training
- URL: http://arxiv.org/abs/2410.18099v1
- Date: Tue, 08 Oct 2024 12:53:22 GMT
- Title: Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-training
- Authors: Junxiao Shen, Khadija Khaldi, Enmin Zhou, Hemant Bhaskar Surale, Amy Karlson,
- Abstract summary: We present a ready-to-use WGK decoder that is generalizable across mid-air and on-surface WGK systems in augmented reality (AR) and virtual reality (VR)
It significantly outperforms SHARK2 with a 37.2% enhancement and surpasses the conventional neural decoder by 7.4%.
It can operate in real-time, in just 97 milliseconds on Quest 3.
- Score: 2.81561528842917
- License:
- Abstract: Text entry with word-gesture keyboards (WGK) is emerging as a popular method and becoming a key interaction for Extended Reality (XR). However, the diversity of interaction modes, keyboard sizes, and visual feedback in these environments introduces divergent word-gesture trajectory data patterns, thus leading to complexity in decoding trajectories into text. Template-matching decoding methods, such as SHARK^2, are commonly used for these WGK systems because they are easy to implement and configure. However, these methods are susceptible to decoding inaccuracies for noisy trajectories. While conventional neural-network-based decoders (neural decoders) trained on word-gesture trajectory data have been proposed to improve accuracy, they have their own limitations: they require extensive data for training and deep-learning expertise for implementation. To address these challenges, we propose a novel solution that combines ease of implementation with high decoding accuracy: a generalizable neural decoder enabled by pre-training on large-scale coarsely discretized word-gesture trajectories. This approach produces a ready-to-use WGK decoder that is generalizable across mid-air and on-surface WGK systems in augmented reality (AR) and virtual reality (VR), which is evident by a robust average Top-4 accuracy of 90.4% on four diverse datasets. It significantly outperforms SHARK^2 with a 37.2% enhancement and surpasses the conventional neural decoder by 7.4%. Moreover, the Pre-trained Neural Decoder's size is only 4 MB after quantization, without sacrificing accuracy, and it can operate in real-time, executing in just 97 milliseconds on Quest 3.
Related papers
- Triple-Encoders: Representations That Fire Together, Wire Together [51.15206713482718]
Contrastive Learning is a representation learning method that encodes relative distances between utterances into the embedding space via a bi-encoder.
This study introduces triple-encoders, which efficiently compute distributed utterance mixtures from these independently encoded utterances.
We find that triple-encoders lead to a substantial improvement over bi-encoders, and even to better zero-shot generalization than single-vector representation models.
arXiv Detail & Related papers (2024-02-19T18:06:02Z) - Data-driven decoding of quantum error correcting codes using graph
neural networks [0.0]
We explore a model-free, data-driven, approach to decoding, using a graph neural network (GNN)
We show that the GNN-based decoder can outperform a matching decoder for circuit level noise on the surface code given only simulated data.
The results show that a purely data-driven approach to decoding may be a viable future option for practical quantum error correction.
arXiv Detail & Related papers (2023-07-03T17:25:45Z) - Few-Shot Open-Set Learning for On-Device Customization of KeyWord
Spotting Systems [41.24728444810133]
This paper investigates few-shot learning methods for open-set KWS classification by combining a deep feature encoder with a prototype-based classifier.
With user-defined keywords from 10 classes of the Google Speech Command dataset, our study reports an accuracy of up to 76% in a 10-shot scenario.
arXiv Detail & Related papers (2023-06-03T17:10:33Z) - Graph Neural Networks for Channel Decoding [71.15576353630667]
We showcase competitive decoding performance for various coding schemes, such as low-density parity-check (LDPC) and BCH codes.
The idea is to let a neural network (NN) learn a generalized message passing algorithm over a given graph.
We benchmark our proposed decoder against state-of-the-art in conventional channel decoding as well as against recent deep learning-based results.
arXiv Detail & Related papers (2022-07-29T15:29:18Z) - Improved decoding of circuit noise and fragile boundaries of tailored
surface codes [61.411482146110984]
We introduce decoders that are both fast and accurate, and can be used with a wide class of quantum error correction codes.
Our decoders, named belief-matching and belief-find, exploit all noise information and thereby unlock higher accuracy demonstrations of QEC.
We find that the decoders led to a much higher threshold and lower qubit overhead in the tailored surface code with respect to the standard, square surface code.
arXiv Detail & Related papers (2022-03-09T18:48:54Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - Adversarial Neural Networks for Error Correcting Codes [76.70040964453638]
We introduce a general framework to boost the performance and applicability of machine learning (ML) models.
We propose to combine ML decoders with a competing discriminator network that tries to distinguish between codewords and noisy words.
Our framework is game-theoretic, motivated by generative adversarial networks (GANs)
arXiv Detail & Related papers (2021-12-21T19:14:44Z) - Dynamic Neural Representational Decoders for High-Resolution Semantic
Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD)
As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z) - DeepRx: Fully Convolutional Deep Learning Receiver [8.739166282613118]
DeepRx is a fully convolutional neural network that executes the whole receiver pipeline from frequency domain signal stream to uncoded bits in a 5G-compliant fashion.
We demonstrate that DeepRx outperforms traditional methods.
arXiv Detail & Related papers (2020-05-04T13:53:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.