Related papers: Examining the Robustness of Large Language Models across Language Complexity

Examining the Robustness of Large Language Models across Language Complexity

URL: http://arxiv.org/abs/2501.18738v1
Date: Thu, 30 Jan 2025 20:33:59 GMT
Title: Examining the Robustness of Large Language Models across Language Complexity
Authors: Jiayi Zhang,
Abstract summary: Large language models (LLMs) analyze textual artifacts generated by students to understand and evaluate their learning.<n>This study examines the robustness of several LLM-based student models that detect student self-regulated learning (SRL) in math problem-solving.
Score: 19.184633713069353
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the advancement of large language models (LLMs), an increasing number of student models have leveraged LLMs to analyze textual artifacts generated by students to understand and evaluate their learning. These student models typically employ pre-trained LLMs to vectorize text inputs into embeddings and then use the embeddings to train models to detect the presence or absence of a construct of interest. However, how reliable and robust are these models at processing language with different levels of complexity? In the context of learning where students may have different language backgrounds with various levels of writing skills, it is critical to examine the robustness of such models to ensure that these models work equally well for text with varying levels of language complexity. Coincidentally, a few (but limited) research studies show that the use of language can indeed impact the performance of LLMs. As such, in the current study, we examined the robustness of several LLM-based student models that detect student self-regulated learning (SRL) in math problem-solving. Specifically, we compared how the performance of these models vary using texts with high and low lexical, syntactic, and semantic complexity measured by three linguistic measures.

Related papers

When Words Change the Model: Sensitivity of LLMs for Constraint Programming Modelling [1.052782170493037]
Large language models show impressive results in automatically generating models for classical benchmarks.<n>Many standard CP problems are likely included in the training data of these models.<n>We show that while LLMs can produce syntactically valid and semantically plausible models, their performance drops sharply under contextual and linguistic variation.
arXiv Detail & Related papers (2025-11-18T10:40:32Z)
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [54.59207567677249]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z)
Randomly Sampled Language Reasoning Problems Reveal Limits of LLMs [8.146860674148044]
We attempt to measure models' language understanding capacity while circumventing the risk of dataset recall. We parameterize large families of language tasks recognized by deterministic finite automata (DFAs) We find that, even in the strikingly simple setting of 3-state DFAs, LLMs underperform un parameterized ngram models on both language recognition and synthesis tasks.
arXiv Detail & Related papers (2025-01-06T07:57:51Z)
A Comparative Study of Learning Paradigms in Large Language Models via Intrinsic Dimension [16.671316494925346]
This study investigates the effects of supervised fine-tuning and in-context learning on the hidden representations of Large Language Models (LLMs)<n>We first explore how the ID of LLM representations evolves during SFT and how it varies due to the number of demonstrations in ICL.<n>We then compare the IDs induced by SFT and ICL and find that ICL consistently induces a higher ID compared to SFT.
arXiv Detail & Related papers (2024-12-09T06:37:35Z)
Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing. We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM. We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z)
Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem [4.830018386227]
This work investigates the in-context learning abilities of pretrained large language models (LLMs) when instructed to translate text from a low-resource language into a high-resource language as part of an automated machine translation pipeline. We conduct a set of experiments translating Southern Quechua to Spanish and examine the informativity of various types of context retrieved from a constrained database of digitized pedagogical materials and parallel corpora.
arXiv Detail & Related papers (2024-06-21T20:02:22Z)
How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering [52.86931192259096]
Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases. Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
arXiv Detail & Related papers (2024-01-11T09:27:50Z)
Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning [57.74233319453229]
Large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task. We propose MultiCSR, a multi-level contrastive sentence representation learning framework that decomposes the process of prompting LLMs to generate a corpus. Our experiments reveal that MultiCSR enables a less advanced LLM to surpass the performance of ChatGPT, while applying it to ChatGPT achieves better state-of-the-art results.
arXiv Detail & Related papers (2023-10-17T03:21:43Z)
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs) We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods. In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z)
From Text to Source: Results in Detecting Large Language Model-Generated Content [17.306542392779445]
Large Language Models (LLMs) are celebrated for their ability to generate human-like text. This paper investigates "Cross-Model Detection," by evaluating whether a classifier trained to distinguish between source LLM-generated and human-written text can also detect text from a target LLM without further training. The research also explores Model Attribution, encompassing source model identification, model family, and model size classification, in addition to quantization and watermarking detection.
arXiv Detail & Related papers (2023-09-23T09:51:37Z)
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling. Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training. Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z)
A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z)
Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost. Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z)
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.