Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study
- URL: http://arxiv.org/abs/2402.15518v2
- Date: Mon, 21 Oct 2024 14:02:00 GMT
- Title: Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study
- Authors: Gonzalo Martínez, José Alberto Hernández, Javier Conde, Pedro Reviriego, Elena Merino,
- Abstract summary: We consider the evaluation of the lexical richness of the text generated by conversational Large Language Models (LLMs) and how it depends on the model parameters.
The results show how lexical richness depends on the version of ChatGPT and some of its parameters, such as the presence penalty, or on the role assigned to the model.
- Score: 3.0059120458540383
- License:
- Abstract: The performance of conversational Large Language Models (LLMs) in general, and of ChatGPT in particular, is currently being evaluated on many different tasks, from logical reasoning or maths to answering questions on a myriad of topics. Instead, much less attention is being devoted to the study of the linguistic features of the texts generated by these LLMs. This is surprising since LLMs are models for language, and understanding how they use the language is important. Indeed, conversational LLMs are poised to have a significant impact on the evolution of languages as they may eventually dominate the creation of new text. This means that for example, if conversational LLMs do not use a word it may become less and less frequent and eventually stop being used altogether. Therefore, evaluating the linguistic features of the text they produce and how those depend on the model parameters is the first step toward understanding the potential impact of conversational LLMs on the evolution of languages. In this paper, we consider the evaluation of the lexical richness of the text generated by LLMs and how it depends on the model parameters. A methodology is presented and used to conduct a comprehensive evaluation of lexical richness using ChatGPT as a case study. The results show how lexical richness depends on the version of ChatGPT and some of its parameters, such as the presence penalty, or on the role assigned to the model. The dataset and tools used in our analysis are released under open licenses with the goal of drawing the much-needed attention to the evaluation of the linguistic features of LLM-generated text.
Related papers
- A Statistical Analysis of LLMs' Self-Evaluation Using Proverbs [1.9073729452914245]
We introduce a novel proverb database consisting of 300 proverb pairs that are similar in intent but different in wording.
We propose tests to evaluate textual consistencies as well as numerical consistencies across similar proverbs.
arXiv Detail & Related papers (2024-10-22T02:38:48Z) - PhonologyBench: Evaluating Phonological Skills of Large Language Models [57.80997670335227]
Phonology, the study of speech's structure and pronunciation rules, is a critical yet often overlooked component in Large Language Model (LLM) research.
We present PhonologyBench, a novel benchmark consisting of three diagnostic tasks designed to explicitly test the phonological skills of LLMs.
We observe a significant gap of 17% and 45% on Rhyme Word Generation and Syllable counting, respectively, when compared to humans.
arXiv Detail & Related papers (2024-04-03T04:53:14Z) - Open Conversational LLMs do not know most Spanish words [2.737783055857426]
We evaluate the knowledge that open-source chat LLMs have of Spanish words by testing a sample of words in a reference dictionary.
Results show that open-source chat LLMs produce incorrect meanings for an important fraction of the words and are not able to use most of the words correctly to write sentences with context.
arXiv Detail & Related papers (2024-03-21T15:41:02Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - Exploring the Potential of Large Language Models in Computational Argumentation [54.85665903448207]
Large language models (LLMs) have demonstrated impressive capabilities in understanding context and generating natural language.
This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models, and LLaMA2 models, in both zero-shot and few-shot settings.
arXiv Detail & Related papers (2023-11-15T15:12:15Z) - Evaluation of large language models using an Indian language LGBTI+
lexicon [3.2047868962340327]
Large language models (LLMs) are typically evaluated on the basis of task-based benchmarks such as MMLU.
This paper presents a methodology for evaluation of LLMs using an LGBTI+ lexicon in Indian languages.
arXiv Detail & Related papers (2023-10-26T21:32:24Z) - Establishing Vocabulary Tests as a Benchmark for Evaluating Large
Language Models [2.7013338932521416]
We advocate for the revival of vocabulary tests as a valuable tool for assessing Large Language Models (LLMs) performance.
We evaluate seven LLMs using two vocabulary test formats across two languages and uncover surprising gaps in their lexical knowledge.
arXiv Detail & Related papers (2023-10-23T08:45:12Z) - An Investigation of LLMs' Inefficacy in Understanding Converse Relations [30.94718664430869]
We introduce a new benchmark ConvRe focusing on converse relations, which contains 17 relations and 1240 triples extracted from knowledge graph completion datasets.
Our ConvRE features two tasks, Re2Text and Text2Re, which are formulated as multi-choice question answering to evaluate LLMs' ability to determine the matching between relations and associated text.
arXiv Detail & Related papers (2023-10-08T13:45:05Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.