Towards New Benchmark for AI Alignment & Sentiment Analysis in Socially Important Issues: A Comparative Study of Human and LLMs in the Context of AGI
- URL: http://arxiv.org/abs/2501.02531v1
- Date: Sun, 05 Jan 2025 13:18:13 GMT
- Title: Towards New Benchmark for AI Alignment & Sentiment Analysis in Socially Important Issues: A Comparative Study of Human and LLMs in the Context of AGI
- Authors: Ljubisa Bojic, Dylan Seychell, Milan Cabarkapa,
- Abstract summary: This research aims to contribute towards establishing a benchmark for evaluating the sentiment of various Large Language Models in socially importan issues.
Seven LLMs, including GPT-4 and Bard, were analyzed and compared against sentiment data from three independent human sample populations.
GPT-4 recorded the most positive sentiment score towards AGI, whereas Bard was leaning towards the neutral sentiment.
- Score: 0.08192907805418582
- License:
- Abstract: With the expansion of neural networks, such as large language models, humanity is exponentially heading towards superintelligence. As various AI systems are increasingly integrated into the fabric of societies-through recommending values, devising creative solutions, and making decisions-it becomes critical to assess how these AI systems impact humans in the long run. This research aims to contribute towards establishing a benchmark for evaluating the sentiment of various Large Language Models in socially importan issues. The methodology adopted was a Likert scale survey. Seven LLMs, including GPT-4 and Bard, were analyzed and compared against sentiment data from three independent human sample populations. Temporal variations in sentiment were also evaluated over three consecutive days. The results highlighted a diversity in sentiment scores among LLMs, ranging from 3.32 to 4.12 out of 5. GPT-4 recorded the most positive sentiment score towards AGI, whereas Bard was leaning towards the neutral sentiment. The human samples, contrastingly, showed a lower average sentiment of 2.97. The temporal comparison revealed differences in sentiment evolution between LLMs in three days, ranging from 1.03% to 8.21%. The study's analysis outlines the prospect of potential conflicts of interest and bias possibilities in LLMs' sentiment formation. Results indicate that LLMs, akin to human cognitive processes, could potentially develop unique sentiments and subtly influence societies' perceptions towards various opinions formed within the LLMs.
Related papers
- Evaluating Large Language Models Against Human Annotators in Latent Content Analysis: Sentiment, Political Leaning, Emotional Intensity, and Sarcasm [0.3141085922386211]
This study evaluates the reliability, consistency, and quality of seven state-of-the-art Large Language Models (LLMs)
A total of 33 human annotators and eight LLM variants assessed 100 curated textual items.
Results reveal that both humans and LLMs exhibit high reliability in sentiment analysis and political leaning assessments.
arXiv Detail & Related papers (2025-01-05T13:28:15Z) - Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.
This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Hate Personified: Investigating the role of LLMs in content moderation [64.26243779985393]
For subjective tasks such as hate detection, where people perceive hate differently, the Large Language Model's (LLM) ability to represent diverse groups is unclear.
By including additional context in prompts, we analyze LLM's sensitivity to geographical priming, persona attributes, and numerical information to assess how well the needs of various groups are reflected.
arXiv Detail & Related papers (2024-10-03T16:43:17Z) - Do Large Language Models Possess Sensitive to Sentiment? [18.88126980975737]
Large Language Models (LLMs) have recently displayed their extraordinary capabilities in language understanding.
This paper investigates the ability of LLMs to detect and react to sentiment in text modal.
arXiv Detail & Related papers (2024-09-04T01:40:20Z) - Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches [69.73783026870998]
This work proposes a novel framework, ValueLex, to reconstruct Large Language Models' unique value system from scratch.
Based on Lexical Hypothesis, ValueLex introduces a generative approach to elicit diverse values from 30+ LLMs.
We identify three core value dimensions, Competence, Character, and Integrity, each with specific subdimensions, revealing that LLMs possess a structured, albeit non-human, value system.
arXiv Detail & Related papers (2024-04-19T09:44:51Z) - GPT-4 Surpassing Human Performance in Linguistic Pragmatics [0.0]
This study investigates the ability of Large Language Models (LLMs) to comprehend and interpret linguistic pragmatics.
Using Grice's communication principles, LLMs and human subjects were evaluated based on their responses to various dialogue-based tasks.
The findings revealed the superior performance and speed of LLMs, particularly GPT4, over human subjects in interpreting pragmatics.
arXiv Detail & Related papers (2023-12-15T05:40:15Z) - Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench [83.41621219298489]
We evaluate Large Language Models' (LLMs) anthropomorphic capabilities using the emotion appraisal theory from psychology.
We collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study.
We conduct a human evaluation involving more than 1,200 subjects worldwide.
arXiv Detail & Related papers (2023-08-07T15:18:30Z) - Exploring Qualitative Research Using LLMs [8.545798128849091]
This study aimed to compare and contrast the comprehension capabilities of humans and AI driven large language models.
We conducted an experiment with small sample of Alexa app reviews, initially classified by a human analyst.
LLMs were then asked to classify these reviews and provide the reasoning behind each classification.
arXiv Detail & Related papers (2023-06-23T05:21:36Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks.
We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.