Related papers: Handling Symbolic Language in Student Texts: A Comparative Study of NLP Embedding Models

Handling Symbolic Language in Student Texts: A Comparative Study of NLP Embedding Models

URL: http://arxiv.org/abs/2505.17950v1
Date: Fri, 23 May 2025 14:26:33 GMT
Title: Handling Symbolic Language in Student Texts: A Comparative Study of NLP Embedding Models
Authors: Tom Bleckmann, Paul Tschisgale,
Abstract summary: This study explores how contemporary embedding models differ in their capability to process and interpret science-related symbolic expressions.<n>Our findings reveal significant differences in model performance, with OpenAI's GPT-text-embedding-3-large outperforming all other examined models.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in Natural Language Processing (NLP) have facilitated the analysis of student-generated language products in learning analytics (LA), particularly through the use of NLP embedding models. Yet when it comes to science-related language, symbolic expressions such as equations and formulas introduce challenges that current embedding models struggle to address. Existing studies and applications often either overlook these challenges or remove symbolic expressions altogether, potentially leading to biased findings and diminished performance of LA applications. This study therefore explores how contemporary embedding models differ in their capability to process and interpret science-related symbolic expressions. To this end, various embedding models are evaluated using physics-specific symbolic expressions drawn from authentic student responses, with performance assessed via two approaches: similarity-based analyses and integration into a machine learning pipeline. Our findings reveal significant differences in model performance, with OpenAI's GPT-text-embedding-3-large outperforming all other examined models, though its advantage over other models was moderate rather than decisive. Beyond performance, additional factors such as cost, regulatory compliance, and model transparency are discussed as key considerations for model selection. Overall, this study underscores the importance for LA researchers and practitioners of carefully selecting NLP embedding models when working with science-related language products that include symbolic expressions.

Related papers

Modeling cognitive processes of natural reading with transformer-based Language Models [2.048226951354646]
Previous research has shown that models such as N-grams and LSTM networks can partially account for predictability effects in explaining eye movement behaviors.<n>In this study, we extend these findings by evaluating transformer-based models (GPT2, LLaMA-7B, and LLaMA2-7B) to further investigate this relationship.<n>Our results indicate that these architectures outperform earlier models in explaining the variance in Gaze Durations recorded from Rioplantense Spanish readers.
arXiv Detail & Related papers (2025-05-16T17:47:58Z)
Linguistically Grounded Analysis of Language Models using Shapley Head Values [2.914115079173979]
We investigate the processing of morphosyntactic phenomena by leveraging a recently proposed method for probing language models via Shapley Head Values (SHVs)<n>Using the English language BLiMP dataset, we test our approach on two widely used models, BERT and RoBERTa, and compare how linguistic constructions are handled.<n>Our results show that SHV-based attributions reveal distinct patterns across both models, providing insights into how language models organize and process linguistic information.
arXiv Detail & Related papers (2024-10-17T09:48:08Z)
Investigating the Timescales of Language Processing with EEG and Language Models [0.0]
This study explores the temporal dynamics of language processing by examining the alignment between word representations from a pre-trained language model and EEG data. Using a Temporal Response Function (TRF) model, we investigate how neural activity corresponds to model representations across different layers. Our analysis reveals patterns in TRFs from distinct layers, highlighting varying contributions to lexical and compositional processing.
arXiv Detail & Related papers (2024-06-28T12:49:27Z)
Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models. Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z)
Computational Models to Study Language Processing in the Human Brain: A Survey [47.81066391664416]
This paper reviews efforts in using computational models for brain research, highlighting emerging trends. Our analysis reveals that no single model outperforms others on all datasets.
arXiv Detail & Related papers (2024-03-20T08:01:22Z)
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling. Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training. Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z)
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space. Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
Scaling Language Models: Methods, Analysis & Insights from Training Gopher [83.98181046650664]
We present an analysis of Transformer-based language model performance across a wide range of model scales. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language. We discuss the application of language models to AI safety and the mitigation of downstream harms.
arXiv Detail & Related papers (2021-12-08T19:41:47Z)
Model-agnostic multi-objective approach for the evolutionary discovery of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results. We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z)
Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability [0.0]
The study compares the state-of-the-art language models - BERT, ELECTRA and its derivatives which include RoBERTa, ALBERT and DistilBERT. The experimental results establish new state-of-the-art for 2013 rating classification task and Financial Phrasebank sentiment detection task with 69% accuracy and 88.2% accuracy respectively.
arXiv Detail & Related papers (2020-09-09T04:31:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.