AiM: Taking Answers in Mind to Correct Chinese Cloze Tests in
Educational Applications
- URL: http://arxiv.org/abs/2208.12505v1
- Date: Fri, 26 Aug 2022 08:56:32 GMT
- Title: AiM: Taking Answers in Mind to Correct Chinese Cloze Tests in
Educational Applications
- Authors: Yusen Zhang, Zhongli Li, Qingyu Zhou, Ziyi Liu, Chao Li, Mina Ma,
Yunbo Cao, Hongzhi Liu
- Abstract summary: We propose a multimodal approach to automatically correct handwritten assignments.
The encoded representations of answers interact with the visual information of students' handwriting.
Experimental results show that AiM outperforms OCR-based methods by a large margin.
- Score: 26.610045625897275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To automatically correct handwritten assignments, the traditional approach is
to use an OCR model to recognize characters and compare them to answers. The
OCR model easily gets confused on recognizing handwritten Chinese characters,
and the textual information of the answers is missing during the model
inference. However, teachers always have these answers in mind to review and
correct assignments. In this paper, we focus on the Chinese cloze tests
correction and propose a multimodal approach (named AiM). The encoded
representations of answers interact with the visual information of students'
handwriting. Instead of predicting 'right' or 'wrong', we perform the sequence
labeling on the answer text to infer which answer character differs from the
handwritten content in a fine-grained way. We take samples of OCR datasets as
the positive samples for this task, and develop a negative sample augmentation
method to scale up the training data. Experimental results show that AiM
outperforms OCR-based methods by a large margin. Extensive studies demonstrate
the effectiveness of our multimodal approach.
Related papers
- Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning [50.26965628047682]
Adapting pre-trained models to open classes is a challenging problem in machine learning.
In this paper, we consider combining the advantages of both and come up with a test-time prompt tuning approach.
Our proposed method outperforms all comparison methods on average considering both base and new classes.
arXiv Detail & Related papers (2024-08-29T12:34:01Z) - Instruction-Guided Scene Text Recognition [51.853730414264625]
We propose a novel instruction-guided scene text recognition (IGTR) paradigm that formulates STR as an instruction learning problem.
We develop lightweight instruction encoder, cross-modal feature fusion module and multi-task answer head, which guides nuanced text image understanding.
IGTR outperforms existing models by significant margins, while maintaining a small model size and efficient inference speed.
arXiv Detail & Related papers (2024-01-31T14:13:01Z) - Is it an i or an l: Test-time Adaptation of Text Line Recognition Models [9.149602257966917]
We introduce the problem of adapting text line recognition models during test time.
We propose an iterative self-training approach that uses feedback from the language model to update the optical model.
Experimental results show that the proposed adaptation method offers an absolute improvement of up to 8% in character error rate.
arXiv Detail & Related papers (2023-08-29T05:44:00Z) - Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to
Document Level [4.250876580245865]
Existing AI-generated text classifiers have limited accuracy and often produce false positives.
We propose a novel approach using natural language processing (NLP) techniques.
We generate multiple paraphrased versions of a given question and inputting them into the large language model to generate answers.
By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student's response.
arXiv Detail & Related papers (2023-06-13T20:34:55Z) - OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models [122.27878464009181]
We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks.
OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
arXiv Detail & Related papers (2023-05-13T11:28:37Z) - CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition [16.987008461171065]
We explore the potential of continual self-supervised learning to alleviate the catastrophic forgetting problem in handwritten text recognition.
Our method consists in adding intermediate layers called adapters for each task, and efficiently distilling knowledge from the previous model while learning the current task.
We attain state-of-the-art performance on English, Italian and Russian scripts, whilst adding only a few parameters per task.
arXiv Detail & Related papers (2023-03-16T14:27:45Z) - Handwriting recognition and automatic scoring for descriptive answers in
Japanese language tests [7.489722641968594]
This paper presents an experiment of automatically scoring handwritten descriptive answers in the trial tests for the new Japanese university entrance examination.
Although all answers have been scored by human examiners, handwritten characters are not labeled.
We present our attempt to adapt deep neural network-based handwriting recognizers trained on a labeled handwriting dataset into this unlabeled answer set.
arXiv Detail & Related papers (2022-01-10T08:47:52Z) - Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR)
In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework.
Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z) - Learning to Ask Conversational Questions by Optimizing Levenshtein
Distance [83.53855889592734]
We introduce a Reinforcement Iterative Sequence Editing (RISE) framework that optimize the minimum Levenshtein distance (MLD) through explicit editing actions.
RISE is able to pay attention to tokens that are related to conversational characteristics.
Experimental results on two benchmark datasets show that RISE significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-06-30T08:44:19Z) - Offline Handwritten Chinese Text Recognition with Convolutional Neural
Networks [5.984124397831814]
In this paper, we build the models using only the convolutional neural networks and use CTC as the loss function.
We achieve 6.81% character error rate (CER) on the ICDAR 2013 competition set, which is the best published result without language model correction.
arXiv Detail & Related papers (2020-06-28T14:34:38Z) - Structured Multimodal Attentions for TextVQA [57.71060302874151]
We propose an end-to-end structured multimodal attention (SMA) neural network to mainly solve the first two issues above.
SMA first uses a structural graph representation to encode the object-object, object-text and text-text relationships appearing in the image, and then designs a multimodal graph attention network to reason over it.
Our proposed model outperforms the SoTA models on TextVQA dataset and two tasks of ST-VQA dataset among all models except pre-training based TAP.
arXiv Detail & Related papers (2020-06-01T07:07:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.