Related papers: The dynamics of meaning through time: Assessment of Large Language Models

The dynamics of meaning through time: Assessment of Large Language Models

URL: http://arxiv.org/abs/2501.05552v1
Date: Thu, 09 Jan 2025 19:56:44 GMT
Title: The dynamics of meaning through time: Assessment of Large Language Models
Authors: Mohamed Taher Alrefaie, Fatty Salem, Nour Eldin Morsy, Nada Samir, Mohamed Medhat Gaber,
Abstract summary: This study aims to evaluate the capabilities of various large language models (LLMs) in capturing temporal dynamics of meaning.<n>Our comparative analysis includes prominent models like ChatGPT, GPT-4, Claude, Bard, Gemini, and Llama.<n>Findings reveal marked differences in each model's handling of historical context and semantic shifts, highlighting both strengths and limitations in temporal semantic understanding.
Score: 2.5864824580604515
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding how large language models (LLMs) grasp the historical context of concepts and their semantic evolution is essential in advancing artificial intelligence and linguistic studies. This study aims to evaluate the capabilities of various LLMs in capturing temporal dynamics of meaning, specifically how they interpret terms across different time periods. We analyze a diverse set of terms from multiple domains, using tailored prompts and measuring responses through both objective metrics (e.g., perplexity and word count) and subjective human expert evaluations. Our comparative analysis includes prominent models like ChatGPT, GPT-4, Claude, Bard, Gemini, and Llama. Findings reveal marked differences in each model's handling of historical context and semantic shifts, highlighting both strengths and limitations in temporal semantic understanding. These insights offer a foundation for refining LLMs to better address the evolving nature of language, with implications for historical text analysis, AI design, and applications in digital humanities.

Related papers

Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition [50.86415025650168]
Masked image modeling (MIM) tends to exploit local structures to reconstruct visual patterns, resulting in limited linguistic knowledge. We propose a Linguistics-aware Masked Image Modeling (LMIM) approach, which channels the linguistic information into the decoding process of MIM through a separate branch.
arXiv Detail & Related papers (2025-03-24T14:53:35Z)
Modern Models, Medieval Texts: A POS Tagging Study of Old Occitan [0.1979158763744267]
Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing. This study examines the performance of open-source LLMs in part-of-speech (POS) tagging for Old Occitan.
arXiv Detail & Related papers (2025-03-10T20:16:01Z)
Benchmarking Linguistic Diversity of Large Language Models [14.824871604671467]
This paper emphasizes the importance of examining the preservation of human linguistic richness by language models.<n>We propose a comprehensive framework for evaluating LLMs from various linguistic diversity perspectives.
arXiv Detail & Related papers (2024-12-13T16:46:03Z)
Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning) Traditional psycholinguistic evaluations often reflect statistical biases that may misrepresent LLMs' true linguistic capabilities. We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.
arXiv Detail & Related papers (2024-11-12T04:16:44Z)
LLMTemporalComparator: A Tool for Analysing Differences in Temporal Adaptations of Large Language Models [17.021220773165016]
This study addresses the challenges of analyzing temporal discrepancies in large language models (LLMs) trained on data from different time periods. We propose a novel system that compares in a systematic way the outputs of two LLM versions based on user-defined queries.
arXiv Detail & Related papers (2024-10-05T15:17:07Z)
Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP.<n>Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions.<n>We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z)
Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence. We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena. As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z)
Can Large Language Models Understand Context? [17.196362853457412]
This paper introduces a context understanding benchmark by adapting existing datasets to suit the evaluation of generative models. Experimental results indicate that pre-trained dense models struggle with understanding more nuanced contextual features when compared to state-of-the-art fine-tuned models. As LLM compression holds growing significance in both research and real-world applications, we assess the context understanding of quantized models under in-context-learning settings.
arXiv Detail & Related papers (2024-02-01T18:55:29Z)
PSST: A Benchmark for Evaluation-driven Text Public-Speaking Style Transfer [16.07576878783396]
We introduce a novel task called Public-Speaking Style Transfer (PSST), which aims to simulate humans to transform official texts into a public-speaking style. Grounded in the analysis of real-world data from a linguistic perspective, we decompose public-speaking style into key sub-styles. We propose a fine-grained evaluation framework to analyze the characteristics and identify the problems of stylized texts.
arXiv Detail & Related papers (2023-11-14T18:50:51Z)
Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training [56.74440457571821]
We analyze tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds. We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize. Our findings have implications for model interpretability, multi-task learning, and learning from limited data.
arXiv Detail & Related papers (2023-10-25T09:09:55Z)
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs) We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods. In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z)
A Linguistic Investigation of Machine Learning based Contradiction Detection Models: An Empirical Analysis and Future Perspectives [0.34998703934432673]
We analyze two Natural Language Inference data sets with respect to their linguistic features. The goal is to identify those syntactic and semantic properties that are particularly hard to comprehend for a machine learning model.
arXiv Detail & Related papers (2022-10-19T10:06:03Z)
Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit Our results show that TLMs can reach performances that are comparable to those achieved by SDM. However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.