Automated Evaluation for Student Argumentative Writing: A Survey
- URL: http://arxiv.org/abs/2205.04083v1
- Date: Mon, 9 May 2022 07:27:59 GMT
- Title: Automated Evaluation for Student Argumentative Writing: A Survey
- Authors: Xinyu Wang, Yohan Lee, Juneyoung Park
- Abstract summary: This paper surveys and organizes research works in an under-studied area, which we call automated evaluation for student argumentative writing.
Unlike traditional automated writing evaluation that focuses on holistic essay scoring, this field is more specific: it focuses on evaluating argumentative essays and offers specific feedback.
- Score: 2.9466390764652415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper surveys and organizes research works in an under-studied area,
which we call automated evaluation for student argumentative writing. Unlike
traditional automated writing evaluation that focuses on holistic essay
scoring, this field is more specific: it focuses on evaluating argumentative
essays and offers specific feedback, including argumentation structures,
argument strength trait score, etc. The focused and detailed evaluation is
useful for helping students acquire important argumentation skill. In this
paper we organize existing works around tasks, data and methods. We further
experiment with BERT on representative datasets, aiming to provide up-to-date
baselines for this field.
Related papers
- What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation [57.550045763103334]
evaluating a story can be more challenging than other generation evaluation tasks.
We first summarize existing storytelling tasks, including text-to-text, visual-to-text, and text-to-visual.
We propose a taxonomy to organize evaluation metrics that have been developed or can be adopted for story evaluation.
arXiv Detail & Related papers (2024-08-26T20:35:42Z) - Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation [13.205613282888676]
We introduce an argument mining dataset that captures the end-to-end process of preparing an argumentative essay for a debate.
Our dataset contains 14k examples of claims that are fully annotated with the various properties supporting the aforementioned tasks.
We find, that while they show promising results on individual tasks in our benchmark, their end-to-end performance on all four tasks in succession deteriorates significantly.
arXiv Detail & Related papers (2024-06-05T11:15:45Z) - A School Student Essay Corpus for Analyzing Interactions of Argumentative Structure and Quality [12.187586364960758]
We present a German corpus of 1,320 essays from school students of two age groups.
Each essay has been manually annotated for argumentative structure and quality on multiple levels of granularity.
We propose baseline approaches to argument mining and essay scoring, and we analyze interactions between both tasks.
arXiv Detail & Related papers (2024-04-03T07:31:53Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [58.6354685593418]
This paper proposes several article-level, field-normalized, and large language model-empowered bibliometric indicators to evaluate reviews.
The newly emerging AI-generated literature reviews are also appraised.
This work offers insights into the current challenges of literature reviews and envisions future directions for their development.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - Empirical Study of Large Language Models as Automated Essay Scoring
Tools in English Composition__Taking TOEFL Independent Writing Task for
Example [25.220438332156114]
This study aims to assess the capabilities and constraints of ChatGPT, a prominent representative of large language models.
This study employs ChatGPT to conduct an automated evaluation of English essays, even with a small sample size.
arXiv Detail & Related papers (2024-01-07T07:13:50Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Toward Educator-focused Automated Scoring Systems for Reading and
Writing [0.0]
This paper addresses the challenges of data and label availability, authentic and extended writing, domain scoring, prompt and source variety, and transfer learning.
It employs techniques that preserve essay length as an important feature without increasing model training costs.
arXiv Detail & Related papers (2021-12-22T15:44:30Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - An Exploratory Study of Argumentative Writing by Young Students: A
Transformer-based Approach [10.541633715913514]
We present a computational exploration of argument critique writing by young students.
Middle school students were asked to criticize an argument presented in the prompt, focusing on identifying and explaining the reasoning flaws.
This task resembles an established college-level argument critique task.
arXiv Detail & Related papers (2020-06-17T13:55:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.