Automated Genre-Aware Article Scoring and Feedback Using Large Language Models
- URL: http://arxiv.org/abs/2410.14165v1
- Date: Fri, 18 Oct 2024 04:13:51 GMT
- Title: Automated Genre-Aware Article Scoring and Feedback Using Large Language Models
- Authors: Chihang Wang, Yuxin Dong, Zhenhong Zhang, Ruotong Wang, Shuo Wang, Jiajing Chen,
- Abstract summary: This paper focuses on the development of an advanced intelligent article scoring system.
It assesses the overall quality of written work and offers detailed feature-based scoring tailored to various article genres.
- Score: 8.10826723408637
- License:
- Abstract: This paper focuses on the development of an advanced intelligent article scoring system that not only assesses the overall quality of written work but also offers detailed feature-based scoring tailored to various article genres. By integrating the pre-trained BERT model with the large language model Chat-GPT, the system gains a deep understanding of both the content and structure of the text, enabling it to provide a thorough evaluation along with targeted suggestions for improvement. Experimental results demonstrate that this system outperforms traditional scoring methods across multiple public datasets, particularly in feature-based assessments, offering a more accurate reflection of the quality of different article types. Moreover, the system generates personalized feedback to assist users in enhancing their writing skills, underscoring the potential and practical value of automated scoring technologies in educational contexts.
Related papers
- What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation [57.550045763103334]
evaluating a story can be more challenging than other generation evaluation tasks.
We first summarize existing storytelling tasks, including text-to-text, visual-to-text, and text-to-visual.
We propose a taxonomy to organize evaluation metrics that have been developed or can be adopted for story evaluation.
arXiv Detail & Related papers (2024-08-26T20:35:42Z) - Pronunciation Assessment with Multi-modal Large Language Models [10.35401596425946]
We propose a scoring system based on large language models (LLMs)
The speech encoder first maps the learner's speech into contextual features.
The adapter layer then transforms these features to align with the text embedding in latent space.
arXiv Detail & Related papers (2024-07-12T12:16:14Z) - ElicitationGPT: Text Elicitation Mechanisms via Language Models [12.945581341789431]
This paper develops mechanisms for scoring elicited text against ground truth text using domain-knowledge-free queries to a large language model.
An empirical evaluation is conducted on peer reviews from a peer-grading dataset and in comparison to manual instructor scores for the peer reviews.
arXiv Detail & Related papers (2024-06-13T17:49:10Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - Evaluating the Generation Capabilities of Large Chinese Language Models [27.598864484231477]
This paper unveils CG-Eval, the first-ever comprehensive and automated evaluation framework.
It assesses the generative capabilities of large Chinese language models across a spectrum of academic disciplines.
Gscore automates the quality measurement of a model's text generation against reference standards.
arXiv Detail & Related papers (2023-08-09T09:22:56Z) - System-Level Natural Language Feedback [83.24259100437965]
We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process.
We conduct two case studies of this approach for improving search query and dialog response generation.
We show the combination of system-level and instance-level feedback brings further gains.
arXiv Detail & Related papers (2023-06-23T16:21:40Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z) - EditEval: An Instruction-Based Benchmark for Text Improvements [73.5918084416016]
This work presents EditEval: An instruction-based, benchmark and evaluation suite for automatic evaluation of editing capabilities.
We evaluate several pre-trained models, which shows that InstructGPT and PEER perform the best, but that most baselines fall below the supervised SOTA.
Our analysis shows that commonly used metrics for editing tasks do not always correlate well, and that optimization for prompts with the highest performance does not necessarily entail the strongest robustness to different models.
arXiv Detail & Related papers (2022-09-27T12:26:05Z) - Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers [0.05857406612420462]
Large-scale, pre-trained language models achieve human-level and superhuman accuracy on existing language understanding tasks.
We propose evaluating systems through a novel measure of prediction coherence.
arXiv Detail & Related papers (2021-09-10T15:04:23Z) - TextFlint: Unified Multilingual Robustness Evaluation Toolkit for
Natural Language Processing [73.16475763422446]
We propose a multilingual robustness evaluation platform for NLP tasks (TextFlint)
It incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis.
TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness.
arXiv Detail & Related papers (2021-03-21T17:20:38Z) - Cognitive Representation Learning of Self-Media Online Article Quality [24.084727302752377]
Self-media online articles are mainly created by users, which have the appearance characteristics of different text levels and multi-modal hybrid editing.
We establish a joint model CoQAN in combination with the layout organization, writing characteristics and text semantics.
We have also constructed a large scale real-world assessment dataset.
arXiv Detail & Related papers (2020-08-13T02:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.