Related papers: A Comprehensive Analysis of Large Language Model Outputs: Similarity, Diversity, and Bias

A Comprehensive Analysis of Large Language Model Outputs: Similarity, Diversity, and Bias

URL: http://arxiv.org/abs/2505.09056v1
Date: Wed, 14 May 2025 01:21:46 GMT
Title: A Comprehensive Analysis of Large Language Model Outputs: Similarity, Diversity, and Bias
Authors: Brandon Smith, Mohamed Reda Bouadjenek, Tahsin Alamgir Kheya, Phillip Dawson, Sunil Aryal,
Abstract summary: Large Language Models (LLMs) represent a major step toward artificial general intelligence.<n>Questions remain about their output similarity, variability, and ethical implications.
Score: 1.7109513360384465
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) represent a major step toward artificial general intelligence, significantly advancing our ability to interact with technology. While LLMs perform well on Natural Language Processing tasks -- such as translation, generation, code writing, and summarization -- questions remain about their output similarity, variability, and ethical implications. For instance, how similar are texts generated by the same model? How does this compare across different models? And which models best uphold ethical standards? To investigate, we used 5{,}000 prompts spanning diverse tasks like generation, explanation, and rewriting. This resulted in approximately 3 million texts from 12 LLMs, including proprietary and open-source systems from OpenAI, Google, Microsoft, Meta, and Mistral. Key findings include: (1) outputs from the same LLM are more similar to each other than to human-written texts; (2) models like WizardLM-2-8x22b generate highly similar outputs, while GPT-4 produces more varied responses; (3) LLM writing styles differ significantly, with Llama 3 and Mistral showing higher similarity, and GPT-4 standing out for distinctiveness; (4) differences in vocabulary and tone underscore the linguistic uniqueness of LLM-generated content; (5) some LLMs demonstrate greater gender balance and reduced bias. These results offer new insights into the behavior and diversity of LLM outputs, helping guide future development and ethical evaluation.

Related papers

Do LLMs produce texts with "human-like" lexical diversity? [0.0]
This study investigates patterns of lexical diversity in LLM-generated texts from four ChatGPT models.<n>Six dimensions of lexical diversity were measured in each text: volume, abundance, variety-repetition, evenness, disparity, and dispersion.<n>Results indicate that LLMs do not produce human-like texts in relation to lexical diversity, and the newer LLMs produce less human-like texts than older models.
arXiv Detail & Related papers (2025-07-31T18:22:11Z)
An Empirical Investigation of Gender Stereotype Representation in Large Language Models: The Italian Case [0.41942958779358674]
This study examines in which manner Large Language Models shape responses to ungendered prompts, contributing to biased outputs.<n>The results highlight how content generated by LLMs can perpetuate stereotypes.<n>The presence of bias in AI-generated text can have significant implications in many fields, such as in the workplaces or in job selections.
arXiv Detail & Related papers (2025-07-25T10:57:29Z)
Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English [66.97110551643722]
We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks.<n>We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs.<n>These findings highlight systematic differences in how LLMs process and reason about different language varieties.
arXiv Detail & Related papers (2025-03-06T05:15:34Z)
Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs)<n>We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy.<n>We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z)
Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text [1.1137087573421256]
This study aims to support efforts to detect and identify textual content generated using Generative AI Large Language Models.<n>We leverage several machine learning algorithms such as Random Forest (RF), and Recurrent Neural Networks (RNN) to understand the important features in attribution.<n>Our method is divided into 1) binary classification to differentiate between human-written and AI-text, and 2) multi classification, to differentiate between human-written text and the text generated by the five different LLM tools.
arXiv Detail & Related papers (2025-01-06T18:46:53Z)
Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs [63.29737699997859]
Large Language Models (LLMs) have demonstrated impressive performance on multimodal tasks, without any multimodal finetuning. In this work, we expose frozen LLMs to image, video, audio and text inputs and analyse their internal representation.
arXiv Detail & Related papers (2024-05-26T21:31:59Z)
Whose LLM is it Anyway? Linguistic Comparison and LLM Attribution for GPT-3.5, GPT-4 and Bard [3.419330841031544]
Large Language Models (LLMs) are capable of generating text that is similar to or surpasses human quality. We compare the vocabulary, Part-Of-Speech (POS) distribution, dependency distribution, and sentiment of texts generated by three of the most popular LLMs to diverse inputs. The results point to significant linguistic variations which, in turn, enable us to attribute a given text to its LLM origin with a favorable 88% accuracy.
arXiv Detail & Related papers (2024-02-22T13:25:17Z)
Large Language Model-Aware In-Context Learning for Code Generation [75.68709482932903]
Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation. We propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation.
arXiv Detail & Related papers (2023-10-15T06:12:58Z)
Enhancing Text-based Knowledge Graph Completion with Zero-Shot Large Language Models: A Focus on Semantic Enhancement [8.472388165833292]
We introduce a framework termed constrained prompts for KGC (CP-KGC) This framework designs prompts that adapt to different datasets to enhance semantic richness. This study extends the performance limits of existing models and promotes further integration of KGC with large language models.
arXiv Detail & Related papers (2023-10-12T12:31:23Z)
Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.