Fugu-MT 論文翻訳(概要): Multimodal Appearance based Gaze-Controlled Virtual Keyboard with Synchronous Asynchronous Interaction for Low-Resource Settings

論文の概要: Multimodal Appearance based Gaze-Controlled Virtual Keyboard with Synchronous Asynchronous Interaction for Low-Resource Settings

arxiv url: http://arxiv.org/abs/2508.16606v1
Date: Tue, 12 Aug 2025 13:08:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-31 21:54:20.573229
Title: Multimodal Appearance based Gaze-Controlled Virtual Keyboard with Synchronous Asynchronous Interaction for Low-Resource Settings
Title（参考訳）: 低リソース設定のための同期非同期インタラクションによるマルチモーダル外観に基づくゲイズ制御仮想キーボード
Authors: Yogesh Kumar Meena, Manish Salvi,
Abstract要約: 本研究は、標準的なカメラハードウェアと併用してディープラーニングを利用するマルチモーダルな外観に基づく視線制御仮想キーボードを提案する。仮想キーボードアプリケーションは、9つのコマンドでメニューベースの選択をサポートし、ユーザーは最大56の英語の文字をスペルしてタイプできる。平均タイピング速度は18.3+-5.31文字/分(マウス)、12.60+-2.99レター/分(アイトラッカー、同期)、10.94+- 1.89文字/分(ウェブカム、同期)、7.86+- 1.69文字/分(ウェブカム、非同期)であった。
参考スコア（独自算出の注目度）: 7.727905404396572
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Over the past decade, the demand for communication devices has increased among individuals with mobility and speech impairments. Eye-gaze tracking has emerged as a promising solution for hands-free communication; however, traditional appearance-based interfaces often face challenges such as accuracy issues, involuntary eye movements, and difficulties with extensive command sets. This work presents a multimodal appearance-based gaze-controlled virtual keyboard that utilises deep learning in conjunction with standard camera hardware, incorporating both synchronous and asynchronous modes for command selection. The virtual keyboard application supports menu-based selection with nine commands, enabling users to spell and type up to 56 English characters, including uppercase and lowercase letters, punctuation, and a delete function for corrections. The proposed system was evaluated with twenty able-bodied participants who completed specially designed typing tasks using three input modalities: (i) a mouse, (ii) an eye-tracker, and (iii) an unmodified webcam. Typing performance was measured in terms of speed and information transfer rate (ITR) at both command and letter levels. Average typing speeds were 18.3+-5.31 letters/min (mouse), 12.60+-2.99letters/min (eye-tracker, synchronous), 10.94 +- 1.89 letters/min (webcam, synchronous), 11.15 +- 2.90 letters/min (eye-tracker, asynchronous), and 7.86 +- 1.69 letters/min (webcam, asynchronous). ITRs were approximately 80.29 +- 15.72 bits/min (command level) and 63.56 +- 11 bits/min (letter level) with webcam in synchronous mode. The system demonstrated good usability and low workload with webcam input, highlighting its user-centred design and promise as an accessible communication tool in low-resource settings.
Abstract（参考訳）: 過去10年間で、モビリティや音声障害を抱える個人の間では、通信機器の需要が増加している。目視追跡は、ハンズフリーコミュニケーションのための有望な解決策として現れてきたが、従来の外観ベースのインタフェースは、しばしば精度の問題、不随意眼球運動、広範囲な命令セットの難しさといった課題に直面している。本研究は,マルチモーダルな外観に基づく視線制御仮想キーボードで,コマンド選択に同期モードと非同期モードを併用して,標準的なカメラハードウェアと併用してディープラーニングを利用する。仮想キーボードアプリケーションは、9つのコマンドでメニューベースの選択をサポートし、ユーザーは最大56の英語文字を綴り、入力することができる。提案システムは,3つの入力モードを用いて特別に設計されたタイピングタスクを完了した20名の有能な被験者を対象に評価を行った。マウス; マウス; マウス; マウス (二)視線追跡装置、及び (三)未修正のウェブカメラ。タイピング性能は命令レベルと文字レベルの速度と情報伝達率(ITR)で測定した。平均タイピング速度は18.3+-5.31文字/分(マウス)、12.60+-2.99レター/分(アイトラッカー、同期)、10.94+- 1.89文字/分(ウェブカム、同期)、11.15+- 2.90文字/分(アイトラッカー、非同期)、7.86+- 1.69文字/分(ウェブカム、非同期)であった。 ITR は約 80.29 +- 15.72 bits/min (コマンドレベル) と 63.56 +- 11 bits/min (レターレベル) で、ウェブカメラは同期モードであった。このシステムは、Webカメラ入力による優れたユーザビリティと低負荷を示し、ユーザ中心の設計と低リソース環境での通信ツールとしての約束を強調した。

論文の概要: Multimodal Appearance based Gaze-Controlled Virtual Keyboard with Synchronous Asynchronous Interaction for Low-Resource Settings

関連論文リスト