Related papers: Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments

Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments

URL: http://arxiv.org/abs/2312.01532v1
Date: Sun, 3 Dec 2023 23:12:49 GMT
Title: Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments
Authors: Shanqing Cai, Subhashini Venugopalan, Katie Seaver, Xiang Xiao, Katrin Tomanek, Sri Jalasutram, Meredith Ringel Morris, Shaun Kane, Ajit Narayanan, Robert L. MacDonald, Emily Kornman, Daniel Vance, Blair Casey, Steve M. Gleason, Philip Q. Nelson, Michael P. Brenner
Abstract summary: We present SpeakFaster, consisting of large language models (LLMs) and a co-designed user interface for text entry in a highly-abbreviated form. Pilot study with 19 non-AAC participants typing on a mobile device by hand demonstrated gains in motor savings in line with the offline simulation. Lab and field testing on two eye-gaze typing users with amyotrophic lateral sclerosis (ALS) demonstrated text-entry rates 29-60% faster than traditional baselines.
Score: 17.715162857028595
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Finding ways to accelerate text input for individuals with profound motor impairments has been a long-standing area of research. Closing the speed gap for augmentative and alternative communication (AAC) devices such as eye-tracking keyboards is important for improving the quality of life for such individuals. Recent advances in neural networks of natural language pose new opportunities for re-thinking strategies and user interfaces for enhanced text-entry for AAC users. In this paper, we present SpeakFaster, consisting of large language models (LLMs) and a co-designed user interface for text entry in a highly-abbreviated form, allowing saving 57% more motor actions than traditional predictive keyboards in offline simulation. A pilot study with 19 non-AAC participants typing on a mobile device by hand demonstrated gains in motor savings in line with the offline simulation, while introducing relatively small effects on overall typing speed. Lab and field testing on two eye-gaze typing users with amyotrophic lateral sclerosis (ALS) demonstrated text-entry rates 29-60% faster than traditional baselines, due to significant saving of expensive keystrokes achieved through phrase and word predictions from context-aware LLMs. These findings provide a strong foundation for further exploration of substantially-accelerated text communication for motor-impaired users and demonstrate a direction for applying LLMs to text-based user interfaces.

Related papers

A New Paradigm of User-Centric Wireless Communication Driven by Large Language Models [53.16213723669751]
Next generation of wireless communications seeks to deeply integrate artificial intelligence with user-centric communication networks. We propose a novel paradigm for wireless communication that innovatively incorporates the nature language to structured query language. We present a prototype system in which a dynamic semantic representation network at the physical layer adapts its encoding depth to meet user requirements.
arXiv Detail & Related papers (2025-04-16T01:43:36Z)
Exploring Mobile Touch Interaction with Large Language Models [26.599610206222142]
We propose to control Large Language Models via touch gestures performed directly on the text. Results demonstrate that touch-based control of LLMs is both feasible and user-friendly. This work lays the foundation for further research into gesture-based interaction with LLMs on touch devices.
arXiv Detail & Related papers (2025-02-11T15:17:00Z)
Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Models [16.532357621144342]
Large language models (LLMs) can describe driving scenes and behaviors with a level of accuracy similar to human perception. We propose a driving behavior narration and reasoning framework that applies LLMs to edge devices. Our experiments show that LLMs deployed on edge devices can achieve satisfactory response speeds.
arXiv Detail & Related papers (2024-09-30T15:03:55Z)
Enabling Real-Time Conversations with Minimal Training Costs [61.80370154101649]
This paper presents a new duplex decoding approach that enhances large language models with duplex ability, requiring minimal training. Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
arXiv Detail & Related papers (2024-09-18T06:27:26Z)
Modulating Language Model Experiences through Frictions [56.17593192325438]
Over-consumption of language model outputs risks propagating unchecked errors in the short-term and damaging human capabilities for critical thinking in the long-term. We propose selective frictions for language model experiences, inspired by behavioral science interventions, to dampen misuse.
arXiv Detail & Related papers (2024-06-24T16:31:11Z)
Sparse Binarization for Fast Keyword Spotting [10.964148450512972]
KWS models can be deployed on edge devices for real-time applications, privacy, and bandwidth efficiency. We propose a novel keyword-spotting model based on sparse input representation followed by a linear classifier. Our method is also more robust in noisy environments while being fast.
arXiv Detail & Related papers (2024-06-09T08:03:48Z)
Learning Generalizable Human Motion Generator with Reinforcement Learning [95.62084727984808]
Text-driven human motion generation is one of the vital tasks in computer-aided content creation. Existing methods often overfit specific motion expressions in the training data, hindering their ability to generalize. We present textbfInstructMotion, which incorporate the trail and error paradigm in reinforcement learning for generalizable human motion generation.
arXiv Detail & Related papers (2024-05-24T13:29:12Z)
Embedded Named Entity Recognition using Probing Classifiers [10.573861741540853]
EMBER enables streaming named entity recognition in decoder-only language models without fine-tuning them. We show that EMBER maintains high token generation rates, with only a negligible decrease in speed of around 1%. We make our code and data available online, including a toolkit for training, testing, and deploying efficient token classification models.
arXiv Detail & Related papers (2024-03-18T12:58:16Z)
TLControl: Trajectory and Language Control for Human Motion Synthesis [68.09806223962323]
We present TLControl, a novel method for realistic human motion synthesis. It incorporates both low-level Trajectory and high-level Language semantics controls. It is practical for interactive and high-quality animation generation.
arXiv Detail & Related papers (2023-11-28T18:54:16Z)
Dialogue-based generation of self-driving simulation scenarios using Large Language Models [14.86435467709869]
Simulation is an invaluable tool for developing and evaluating controllers for self-driving cars. Current simulation frameworks are driven by highly-specialist domain specific languages. There is often a gap between a concise English utterance and the executable code that captures the user's intent.
arXiv Detail & Related papers (2023-10-26T13:07:01Z)
Typing on Any Surface: A Deep Learning-based Method for Real-Time Keystroke Detection in Augmented Reality [4.857109990499532]
Mid-air keyboard interface, wireless keyboards or voice input, either suffer from poor ergonomic design, limited accuracy, or are simply embarrassing to use in public. This paper proposes and validates a deep-learning based approach, that enables AR applications to accurately predict keystrokes from the user perspective RGB video stream. A two-stage model, combing an off-the-shelf hand landmark extractor and a novel adaptive Convolutional Recurrent Neural Network (C-RNN) was trained.
arXiv Detail & Related papers (2023-08-31T23:58:25Z)
Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding [78.71529237748018]
Grounding temporal video segments described in natural language queries effectively and efficiently is a crucial capability needed in vision-and-language fields. Most existing approaches adopt elaborately designed cross-modal interaction modules to improve the grounding performance. We propose a commonsense-aware cross-modal alignment framework, which incorporates commonsense-guided visual and text representations into a complementary common space.
arXiv Detail & Related papers (2022-04-04T13:07:05Z)
X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback [83.95599156217945]
We focus on assistive typing applications in which a user cannot operate a keyboard, but can supply other inputs. Standard methods train a model on a fixed dataset of user inputs, then deploy a static interface that does not learn from its mistakes. We investigate a simple idea that would enable such interfaces to improve over time, with minimal additional effort from the user.
arXiv Detail & Related papers (2022-03-04T00:07:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.