Long Context Automated Essay Scoring with Language Models
- URL: http://arxiv.org/abs/2509.10417v1
- Date: Fri, 12 Sep 2025 17:13:47 GMT
- Title: Long Context Automated Essay Scoring with Language Models
- Authors: Christopher Ormerod, Gitit Kehat,
- Abstract summary: A common approach to addressing this issue when using these models for Automated Essay Scoring is to truncate the input text.<n>This raises serious validity concerns as it undermines the model's ability to fully capture and evaluate organizational elements of the scoring rubric.<n>We evaluate several models that incorporate architectural modifications of the standard transformer architecture to overcome these length limitations using the Kaggle ASAP 2.0 dataset.
- Score: 0.34376560669160394
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Transformer-based language models are architecturally constrained to process text of a fixed maximum length. Essays written by higher-grade students frequently exceed the maximum allowed length for many popular open-source models. A common approach to addressing this issue when using these models for Automated Essay Scoring is to truncate the input text. This raises serious validity concerns as it undermines the model's ability to fully capture and evaluate organizational elements of the scoring rubric, which requires long contexts to assess. In this study, we evaluate several models that incorporate architectural modifications of the standard transformer architecture to overcome these length limitations using the Kaggle ASAP 2.0 dataset. The models considered in this study include fine-tuned versions of XLNet, Longformer, ModernBERT, Mamba, and Llama models.
Related papers
- Empirical Comparison of Encoder-Based Language Models and Feature-Based Supervised Machine Learning Approaches to Automated Scoring of Long Essays [8.899249868081956]
Long context may impose challenges for encoder-only language models in text processing.<n>This study trained several commonly used encoder-based language models for automated scoring of long essays.
arXiv Detail & Related papers (2026-01-06T02:17:45Z) - SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension [77.93156509994994]
We show how to represent short chunks in a way that is conditioned on a broader context window to enhance retrieval performance.<n>Existing embedding models are not well-equipped to encode such situated context effectively.<n>Our method substantially outperforms state-of-the-art embedding models.
arXiv Detail & Related papers (2025-08-03T23:59:31Z) - Summarizing long regulatory documents with a multi-step pipeline [2.2591852560804675]
We show that the effectiveness of a two-step architecture for summarizing long regulatory texts varies depending on the model used.
For abstractive encoder-decoder models with short context lengths, the effectiveness of an extractive step varies, whereas for long-context encoder-decoder models, the extractive step worsens their performance.
arXiv Detail & Related papers (2024-08-19T08:07:25Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Revisiting Automated Topic Model Evaluation with Large Language Models [82.93251466435208]
We find that large language models appropriately assess the resulting topics.
We then investigate whether we can use large language models to automatically determine the optimal number of topics.
arXiv Detail & Related papers (2023-05-20T09:42:00Z) - Leveraging BERT Language Model for Arabic Long Document Classification [0.47138177023764655]
We propose two models to classify long length Arabic documents.
Both of our models outperform the Longformer and RoBERT in this task over two different datasets.
arXiv Detail & Related papers (2023-05-04T13:56:32Z) - A Survey on Long Text Modeling with Transformers [106.50471784909212]
We provide an overview of the recent advances on long texts modeling based on Transformer models.<n>We discuss how to process long input to satisfy the length limitation and design improved Transformer architectures.<n>We describe four typical applications involving long text modeling and conclude this paper with a discussion of future directions.
arXiv Detail & Related papers (2023-02-28T11:34:30Z) - Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications.
On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation.
Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z) - Adapting Pretrained Text-to-Text Models for Long Text Sequences [39.62224414485055]
We adapt an existing pretrained text-to-text model for long-sequence inputs.
We build a long-context model that achieves competitive performance on long-text QA tasks.
arXiv Detail & Related papers (2022-09-21T00:41:07Z) - Text Generation with Text-Editing Models [78.03750739936956]
This tutorial provides a comprehensive overview of text-editing models and current state-of-the-art approaches.
We discuss challenges related to productionization and how these models can be used to mitigate hallucination and bias.
arXiv Detail & Related papers (2022-06-14T17:58:17Z) - Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical
Encoder for Long-Form Document Matching [28.190001111358438]
We propose a Siamese Multi-depth Transformer-based SMITH for long-form document matching.
Our model contains several innovations to adapt self-attention models for longer text input.
We will open source a Wikipedia based benchmark dataset, code and a pre-trained checkpoint to accelerate future research on long-form document matching.
arXiv Detail & Related papers (2020-04-26T07:04:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.