GazeReader: Detecting Unknown Word Using Webcam for English as a Second
Language (ESL) Learners
- URL: http://arxiv.org/abs/2303.10443v1
- Date: Sat, 18 Mar 2023 15:55:49 GMT
- Title: GazeReader: Detecting Unknown Word Using Webcam for English as a Second
Language (ESL) Learners
- Authors: Jiexin Ding, Bowen Zhao, Yuqi Huang, Yuntao Wang, Yuanchun Shi
- Abstract summary: We propose GazeReader, an unknown word detection method only using a webcam.
GazeReader tracks the learner's gaze and then applies a transformer-based machine learning model that encodes the text information to locate the unknown word.
- Score: 24.009130595261123
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automatic unknown word detection techniques can enable new applications for
assisting English as a Second Language (ESL) learners, thus improving their
reading experiences. However, most modern unknown word detection methods
require dedicated eye-tracking devices with high precision that are not easily
accessible to end-users. In this work, we propose GazeReader, an unknown word
detection method only using a webcam. GazeReader tracks the learner's gaze and
then applies a transformer-based machine learning model that encodes the text
information to locate the unknown word. We applied knowledge enhancement
including term frequency, part of speech, and named entity recognition to
improve the performance. The user study indicates that the accuracy and
F1-score of our method were 98.09% and 75.73%, respectively. Lastly, we
explored the design scope for ESL reading and discussed the findings.
Related papers
- Unknown Word Detection for English as a Second Language (ESL) Learners Using Gaze and Pre-trained Language Models [24.607431783798425]
This paper presents EyeLingo, a transformer-based machine learning method that predicts the probability of unknown words based on text content and eye gaze trajectory in real time with high accuracy.
A 20-participant user study revealed that our method can achieve an accuracy of 97.6%, and an F1-score of 71.1%.
arXiv Detail & Related papers (2025-02-14T18:57:04Z) - LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection [87.43727192273772]
It is often hard to tell whether a piece of text was human-written or machine-generated.
We present LLM-DetectAIve, designed for fine-grained detection.
It supports four categories: (i) human-written, (ii) machine-generated, (iii) machine-written, then machine-humanized, and (iv) human-written, then machine-polished.
arXiv Detail & Related papers (2024-08-08T07:43:17Z) - MENTOR: Multilingual tExt detectioN TOward leaRning by analogy [59.37382045577384]
We propose a framework to detect and identify both seen and unseen language regions inside scene images.
"MENTOR" is the first work to realize a learning strategy between zero-shot learning and few-shot learning for multilingual scene text detection.
arXiv Detail & Related papers (2024-03-12T03:35:17Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained
Descriptors [58.75140338866403]
DVDet is a Descriptor-Enhanced Open Vocabulary Detector.
It transforms regional embeddings into image-like representations that can be directly integrated into general open vocabulary detection training.
Extensive experiments over multiple large-scale benchmarks show that DVDet outperforms the state-of-the-art consistently by large margins.
arXiv Detail & Related papers (2024-02-07T07:26:49Z) - Plug-and-Play Multilingual Few-shot Spoken Words Recognition [3.591566487849146]
We propose PLiX, a multilingual and plug-and-play keyword spotting system.
Our few-shot deep models are learned with millions of one-second audio clips across 20 languages.
We show that PLiX can generalize to novel spoken words given as few as just one support example.
arXiv Detail & Related papers (2023-05-03T18:58:14Z) - Weakly-supervised Fingerspelling Recognition in British Sign Language
Videos [85.61513254261523]
Previous fingerspelling recognition methods have not focused on British Sign Language (BSL)
In contrast to previous methods, our method only uses weak annotations from subtitles for training.
We propose a Transformer architecture adapted to this task, with a multiple-hypothesis CTC loss function to learn from alternative annotation possibilities.
arXiv Detail & Related papers (2022-11-16T15:02:36Z) - Localized Vision-Language Matching for Open-vocabulary Object Detection [41.98293277826196]
We propose an open-world object detection method that learns to detect novel object classes along with a given set of known classes.
It is a two-stage training approach that first uses a location-guided image-caption matching technique to learn class labels.
We show that a simple language model fits better than a large contextualized language model for detecting novel objects.
arXiv Detail & Related papers (2022-05-12T15:34:37Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Spell my name: keyword boosted speech recognition [25.931897154065663]
uncommon words such as names and technical terminology are important to understanding conversations in context.
We propose a simple but powerful ASR decoding method that can better recognise these uncommon keywords.
The method boosts the probabilities of given keywords in a beam search based on acoustic model predictions.
We demonstrate the effectiveness of our method on the LibriSpeeech test sets and also internal data of real-world conversations.
arXiv Detail & Related papers (2021-10-06T14:16:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.