LXPER Index 2.0: Improving Text Readability Assessment Model for L2
English Students in Korea
- URL: http://arxiv.org/abs/2010.13374v4
- Date: Fri, 11 Dec 2020 11:04:49 GMT
- Title: LXPER Index 2.0: Improving Text Readability Assessment Model for L2
English Students in Korea
- Authors: Bruce W. Lee and Jason Lee
- Abstract summary: This paper investigates a text readability assessment model for L2 English learners in Korea.
We train our model with CoKEC-text and significantly improve the accuracy of readability assessment for texts in the Korean ELT curriculum.
- Score: 1.7006003864727408
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Developing a text readability assessment model specifically for texts in a
foreign English Language Training (ELT) curriculum has never had much attention
in the field of Natural Language Processing. Hence, most developed models show
extremely low accuracy for L2 English texts, up to the point where not many
even serve as a fair comparison. In this paper, we investigate a text
readability assessment model for L2 English learners in Korea. In accordance,
we improve and expand the Text Corpus of the Korean ELT curriculum
(CoKEC-text). Each text is labeled with its target grade level. We train our
model with CoKEC-text and significantly improve the accuracy of readability
assessment for texts in the Korean ELT curriculum.
Related papers
- Enriching the Korean Learner Corpus with Multi-reference Annotations and Rubric-Based Scoring [2.824980053889876]
We enhance the KoLLA Korean learner corpus by adding grammatical error correction references.
We enrich the corpus with rubric-based scores aligned with guidelines from the Korean National Language Institute.
arXiv Detail & Related papers (2025-05-01T03:04:07Z) - UKTA: Unified Korean Text Analyzer [7.342330109393445]
UKTA (Unified Korean Text Analyzer) is a comprehensive Korea text analysis and writing evaluation system.
UKTA provides accurate low-level morpheme analysis, key lexical features for mid-level explainability, and transparent high-level rubric-based writing scores.
arXiv Detail & Related papers (2025-02-11T13:30:56Z) - HyperCLOVA X Technical Report [119.94633129762133]
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture.
HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets.
The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English.
arXiv Detail & Related papers (2024-04-02T13:48:49Z) - KIT-19: A Comprehensive Korean Instruction Toolkit on 19 Tasks for Fine-Tuning Korean Large Language Models [0.0]
textitKIT-19 is a dataset created in an instruction format, comprising 19 existing open-source datasets for Korean NLP tasks.
The experimental results show that the model trained on textitKIT-19 significantly outperforms existing Korean LLMs.
arXiv Detail & Related papers (2024-03-25T06:15:21Z) - CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean [18.526285276022907]
We introduce a benchmark of Cultural and Linguistic Intelligence in Korean dataset comprising 1,995 QA pairs.
CLIcK sources its data from official Korean exams and textbooks, partitioning the questions into eleven categories under the two main categories of language and culture.
Using CLIcK, we test 13 language models to assess their performance. Our evaluation uncovers insights into their performances across the categories, as well as the diverse factors affecting their comprehension.
arXiv Detail & Related papers (2024-03-11T03:54:33Z) - Efficient and Effective Vocabulary Expansion Towards Multilingual Large
Language Models [9.359647125218359]
This report introduces textttEEVE-Korean-v1.0, a Korean adaptation of large language models.
Our method can significantly boost non-English proficiency within just 2 billion tokens.
arXiv Detail & Related papers (2024-02-22T17:12:39Z) - Teacher Perception of Automatically Extracted Grammar Concepts for L2
Language Learning [66.79173000135717]
We apply this work to teaching two Indian languages, Kannada and Marathi, which do not have well-developed resources for second language learning.
We extract descriptions from a natural text corpus that answer questions about morphosyntax (learning of word order, agreement, case marking, or word formation) and semantics (learning of vocabulary).
We enlist the help of language educators from schools in North America to perform a manual evaluation, who find the materials have potential to be used for their lesson preparation and learner evaluation.
arXiv Detail & Related papers (2023-10-27T18:17:29Z) - Cross-modality Data Augmentation for End-to-End Sign Language Translation [66.46877279084083]
End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations.
It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data.
We propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation.
arXiv Detail & Related papers (2023-05-18T16:34:18Z) - Dual-Alignment Pre-training for Cross-lingual Sentence Embedding [79.98111074307657]
We propose a dual-alignment pre-training (DAP) framework for cross-lingual sentence embedding.
We introduce a novel representation translation learning (RTL) task, where the model learns to use one-side contextualized token representation to reconstruct its translation counterpart.
Our approach can significantly improve sentence embedding.
arXiv Detail & Related papers (2023-05-16T03:53:30Z) - SESCORE2: Learning Text Generation Evaluation via Synthesizing Realistic
Mistakes [93.19166902594168]
We propose SESCORE2, a self-supervised approach for training a model-based metric for text generation evaluation.
Key concept is to synthesize realistic model mistakes by perturbing sentences retrieved from a corpus.
We evaluate SESCORE2 and previous methods on four text generation tasks across three languages.
arXiv Detail & Related papers (2022-12-19T09:02:16Z) - A Transfer Learning Based Model for Text Readability Assessment in
German [4.550811027560416]
We propose a new model for text complexity assessment for German text based on transfer learning.
Best model is based on the BERT pre-trained language model achieved the Root Mean Square Error (RMSE) of 0.483.
arXiv Detail & Related papers (2022-07-13T15:15:44Z) - Improving Cross-Lingual Reading Comprehension with Self-Training [62.73937175625953]
Current state-of-the-art models even surpass human performance on several benchmarks.
Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension.
This paper further utilized unlabeled data to improve the performance.
arXiv Detail & Related papers (2021-05-08T08:04:30Z) - LXPER Index: a curriculum-specific text readability assessment model for
EFL students in Korea [0.5076419064097734]
LXPER Index is a readability assessment model for non-native English readers in the ELT curriculum of Korea.
Our new model, trained with CoKEC-text, significantly improves the accuracy of automatic readability assessment for texts in the Korean ELT curriculum.
arXiv Detail & Related papers (2020-08-01T11:55:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.