Neural machine translation for automated feedback on children's
early-stage writing
- URL: http://arxiv.org/abs/2311.09389v1
- Date: Wed, 15 Nov 2023 21:32:44 GMT
- Title: Neural machine translation for automated feedback on children's
early-stage writing
- Authors: Jonas Vestergaard Jensen, Mikkel Jordahn, Michael Riis Andersen
- Abstract summary: We address the problem of assessing and constructing feedback for early-stage writing automatically using machine learning.
We propose to use sequence-to-sequence models for "translating" early-stage writing by students into "conventional" writing.
- Score: 3.0695550123017514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we address the problem of assessing and constructing feedback
for early-stage writing automatically using machine learning. Early-stage
writing is typically vastly different from conventional writing due to phonetic
spelling and lack of proper grammar, punctuation, spacing etc. Consequently,
early-stage writing is highly non-trivial to analyze using common linguistic
metrics. We propose to use sequence-to-sequence models for "translating"
early-stage writing by students into "conventional" writing, which allows the
translated text to be analyzed using linguistic metrics. Furthermore, we
propose a novel robust likelihood to mitigate the effect of noise in the
dataset. We investigate the proposed methods using a set of numerical
experiments and demonstrate that the conventional text can be predicted with
high accuracy.
Related papers
- Historical German Text Normalization Using Type- and Token-Based Language Modeling [0.0]
This report proposes a normalization system for German literary texts from c. 1700-1900, trained on a parallel corpus.
The proposed system makes use of a machine learning approach using Transformer language models, combining an encoder-decoder model to normalize individual word types, and a pre-trained causal language model to adjust these normalizations within their context.
An extensive evaluation shows that the proposed system provides state-of-the-art accuracy, comparable with a much larger fully end-to-end sentence-based normalization system, fine-tuning a pre-trained Transformer large language model.
arXiv Detail & Related papers (2024-09-04T16:14:05Z) - Fine-grained Controllable Text Generation through In-context Learning with Feedback [57.396980277089135]
We present a method for rewriting an input sentence to match specific values of nontrivial linguistic features, such as dependency depth.
In contrast to earlier work, our method uses in-context learning rather than finetuning, making it applicable in use cases where data is sparse.
arXiv Detail & Related papers (2024-06-17T08:55:48Z) - Take the Hint: Improving Arabic Diacritization with
Partially-Diacritized Text [4.863310073296471]
We propose 2SDiac, a multi-source model that can effectively support optional diacritics in input to inform all predictions.
We also introduce Guided Learning, a training scheme to leverage given diacritics in input with different levels of random masking.
arXiv Detail & Related papers (2023-06-06T10:18:17Z) - Discontinuous Grammar as a Foreign Language [0.7412445894287709]
We extend the framework of sequence-to-sequence models for constituent parsing.
We design several novelizations that can fully produce discontinuities.
For the first time, we test a sequence-to-sequence model on the main discontinuous benchmarks.
arXiv Detail & Related papers (2021-10-20T08:58:02Z) - Long Text Generation by Modeling Sentence-Level and Discourse-Level
Coherence [59.51720326054546]
We propose a long text generation model, which can represent the prefix sentences at sentence level and discourse level in the decoding process.
Our model can generate more coherent texts than state-of-the-art baselines.
arXiv Detail & Related papers (2021-05-19T07:29:08Z) - Narrative Incoherence Detection [76.43894977558811]
We propose the task of narrative incoherence detection as a new arena for inter-sentential semantic understanding.
Given a multi-sentence narrative, decide whether there exist any semantic discrepancies in the narrative flow.
arXiv Detail & Related papers (2020-12-21T07:18:08Z) - Curious Case of Language Generation Evaluation Metrics: A Cautionary
Tale [52.663117551150954]
A few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation.
This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them.
In this paper, we urge the community for more careful consideration of how they automatically evaluate their models.
arXiv Detail & Related papers (2020-10-26T13:57:20Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.