Pedagogical Word Recommendation: A novel task and dataset on
personalized vocabulary acquisition for L2 learners
- URL: http://arxiv.org/abs/2112.13808v2
- Date: Tue, 28 Dec 2021 04:52:26 GMT
- Title: Pedagogical Word Recommendation: A novel task and dataset on
personalized vocabulary acquisition for L2 learners
- Authors: Jamin Shin, Juneyoung Park
- Abstract summary: We propose and release data for a novel task called Pedagogical Word Recommendation.
The main goal of PWR is to predict whether a given learner knows a given word based on other words the learner has already seen.
As a feature of this ITS, students can directly indicate words they do not know from the questions they solved to create wordbooks.
- Score: 4.507860128918788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When learning a second language (L2), one of the most important but tedious
components that often demoralizes students with its ineffectiveness and
inefficiency is vocabulary acquisition, or more simply put, memorizing words.
In light of such, a personalized and educational vocabulary recommendation
system that traces a learner's vocabulary knowledge state would have an immense
learning impact as it could resolve both issues. Therefore, in this paper, we
propose and release data for a novel task called Pedagogical Word
Recommendation (PWR). The main goal of PWR is to predict whether a given
learner knows a given word based on other words the learner has already seen.
To elaborate, we collect this data via an Intelligent Tutoring System (ITS)
that is serviced to ~1M L2 learners who study for the standardized English
exam, TOEIC. As a feature of this ITS, students can directly indicate words
they do not know from the questions they solved to create wordbooks. Finally,
we report the evaluation results of a Neural Collaborative Filtering approach
along with an exploratory data analysis and discuss the impact and efficacy of
this dataset as a baseline for future studies on this task.
Related papers
- Are BabyLMs Second Language Learners? [48.85680614529188]
This paper describes a linguistically-motivated approach to the 2024 edition of the BabyLM Challenge.
Rather than pursuing a first language learning (L1) paradigm, we approach the challenge from a second language (L2) learning perspective.
arXiv Detail & Related papers (2024-10-28T17:52:15Z) - A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - Teacher Perception of Automatically Extracted Grammar Concepts for L2
Language Learning [66.79173000135717]
We apply this work to teaching two Indian languages, Kannada and Marathi, which do not have well-developed resources for second language learning.
We extract descriptions from a natural text corpus that answer questions about morphosyntax (learning of word order, agreement, case marking, or word formation) and semantics (learning of vocabulary).
We enlist the help of language educators from schools in North America to perform a manual evaluation, who find the materials have potential to be used for their lesson preparation and learner evaluation.
arXiv Detail & Related papers (2023-10-27T18:17:29Z) - Storyfier: Exploring Vocabulary Learning Support with Text Generation
Models [52.58844741797822]
We develop Storyfier to provide a coherent context for any target words of learners' interests.
learners generally favor the generated stories for connecting target words and writing assistance for easing their learning workload.
In read-cloze-write learning sessions, participants using Storyfier perform worse in recalling and using target words than learning with a baseline tool without our AI features.
arXiv Detail & Related papers (2023-08-07T18:25:00Z) - Semi-Supervised Lifelong Language Learning [81.0685290973989]
We explore a novel setting, semi-supervised lifelong language learning (SSLL), where a model learns sequentially arriving language tasks with both labeled and unlabeled data.
Specially, we dedicate task-specific modules to alleviate catastrophic forgetting and design two modules to exploit unlabeled data.
Experimental results on various language tasks demonstrate our model's effectiveness and superiority over competitive baselines.
arXiv Detail & Related papers (2022-11-23T15:51:33Z) - Unravelling Interlanguage Facts via Explainable Machine Learning [10.71581852108984]
We focus on the internals of an NLI classifier trained by an emphexplainable machine learning algorithm.
We use this perspective in order to tackle both NLI and a companion task, guessing whether a text has been written by a native or a non-native speaker.
We investigate which kind of linguistic traits are most effective for solving our two tasks, namely, are most indicative of a speaker's L1.
arXiv Detail & Related papers (2022-08-02T14:05:15Z) - An Attention-Based Model for Predicting Contextual Informativeness and
Curriculum Learning Applications [11.775048147405725]
We develop models for estimating contextual informativeness, focusing on the instructional aspect of sentences.
We show how our model identifies key contextual elements in a sentence that are likely to contribute most to a reader's understanding of the target word.
We believe our results open new possibilities for applications that support language learning for both human and machine learners.
arXiv Detail & Related papers (2022-04-21T05:17:49Z) - MuLVE, A Multi-Language Vocabulary Evaluation Data Set [2.9005223064604078]
This work introduces Multi-Language Vocabulary Evaluation Data Set (MuLVE), a data set consisting of vocabulary cards and real-life user answers.
The data set contains vocabulary questions in German and English, Spanish, and French as target language.
We experiment to fine-tune pre-trained BERT language models on the downstream task of vocabulary evaluation with the proposed MuLVE data set.
arXiv Detail & Related papers (2022-01-17T09:02:59Z) - VidLanKD: Improving Language Understanding via Video-Distilled Knowledge
Transfer [76.3906723777229]
We present VidLanKD, a video-language knowledge distillation method for improving language understanding.
We train a multi-modal teacher model on a video-text dataset, and then transfer its knowledge to a student language model with a text dataset.
In our experiments, VidLanKD achieves consistent improvements over text-only language models and vokenization models.
arXiv Detail & Related papers (2021-07-06T15:41:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.