On the application of Large Language Models for language teaching and
assessment technology
- URL: http://arxiv.org/abs/2307.08393v1
- Date: Mon, 17 Jul 2023 11:12:56 GMT
- Title: On the application of Large Language Models for language teaching and
assessment technology
- Authors: Andrew Caines, Luca Benedetto, Shiva Taslimipoor, Christopher Davis,
Yuan Gao, Oeistein Andersen, Zheng Yuan, Mark Elliott, Russell Moore,
Christopher Bryant, Marek Rei, Helen Yannakoudakis, Andrew Mullooly, Diane
Nicholls, Paula Buttery
- Abstract summary: We look at the potential for incorporating large language models in AI-driven language teaching and assessment systems.
We find that larger language models offer improvements over previous models in text generation.
For automated grading and grammatical error correction, tasks whose progress is checked on well-known benchmarks, early investigations indicate that large language models on their own do not improve on state-of-the-art results.
- Score: 18.735612275207853
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent release of very large language models such as PaLM and GPT-4 has
made an unprecedented impact in the popular media and public consciousness,
giving rise to a mixture of excitement and fear as to their capabilities and
potential uses, and shining a light on natural language processing research
which had not previously received so much attention. The developments offer
great promise for education technology, and in this paper we look specifically
at the potential for incorporating large language models in AI-driven language
teaching and assessment systems. We consider several research areas and also
discuss the risks and ethical considerations surrounding generative AI in
education technology for language learners. Overall we find that larger
language models offer improvements over previous models in text generation,
opening up routes toward content generation which had not previously been
plausible. For text generation they must be prompted carefully and their
outputs may need to be reshaped before they are ready for use. For automated
grading and grammatical error correction, tasks whose progress is checked on
well-known benchmarks, early investigations indicate that large language models
on their own do not improve on state-of-the-art results according to standard
evaluation metrics. For grading it appears that linguistic features established
in the literature should still be used for best performance, and for error
correction it may be that the models can offer alternative feedback styles
which are not measured sensitively with existing methods. In all cases, there
is work to be done to experiment with the inclusion of large language models in
education technology for language learners, in order to properly understand and
report on their capacities and limitations, and to ensure that foreseeable
risks such as misinformation and harmful bias are mitigated.
Related papers
- Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning [84.94709351266557]
We focus on the trustworthiness of language models with respect to retrieval augmentation.
We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge.
Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence.
arXiv Detail & Related papers (2024-10-22T09:25:21Z) - We're Calling an Intervention: Exploring the Fundamental Hurdles in Adapting Language Models to Nonstandard Text [8.956635443376527]
We present a suite of experiments that allow us to understand the underlying challenges of language model adaptation to nonstandard text.
We do so by designing interventions that approximate several types of linguistic variation and their interactions with existing biases of language models.
Applying our interventions during language model adaptation with varying size and nature of training data, we gain important insights into when knowledge transfer can be successful.
arXiv Detail & Related papers (2024-04-10T18:56:53Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades.
Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora.
To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z) - Language Model Behavior: A Comprehensive Survey [5.663056267168211]
We discuss over 250 recent studies of English language model behavior before task-specific fine-tuning.
Despite dramatic increases in generated text quality as models scale to hundreds of billions of parameters, the models are still prone to unfactual responses, commonsense errors, memorized text, and social biases.
arXiv Detail & Related papers (2023-03-20T23:54:26Z) - Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity.
We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model.
By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z) - Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in
Natural Language Understanding [1.827510863075184]
Curriculum is a new format of NLI benchmark for evaluation of broad-coverage linguistic phenomena.
We show that this linguistic-phenomena-driven benchmark can serve as an effective tool for diagnosing model behavior and verifying model learning quality.
arXiv Detail & Related papers (2022-04-13T10:32:03Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.