ChatGPT as a commenter to the news: can LLMs generate human-like
opinions?
- URL: http://arxiv.org/abs/2312.13961v1
- Date: Thu, 21 Dec 2023 15:46:36 GMT
- Title: ChatGPT as a commenter to the news: can LLMs generate human-like
opinions?
- Authors: Rayden Tseng, Suzan Verberne and Peter van der Putten
- Abstract summary: We investigate what extent GPT-3.5 can generate human-like comments on Dutch news articles.
We analyze human likeness across multiple prompting techniques.
We find that our fine-tuned BERT models can easily distinguish human-written comments from GPT-3.5 generated comments.
- Score: 3.0309690768567754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: ChatGPT, GPT-3.5, and other large language models (LLMs) have drawn
significant attention since their release, and the abilities of these models
have been investigated for a wide variety of tasks. In this research we
investigate to what extent GPT-3.5 can generate human-like comments on Dutch
news articles. We define human likeness as `not distinguishable from human
comments', approximated by the difficulty of automatic classification between
human and GPT comments. We analyze human likeness across multiple prompting
techniques. In particular, we utilize zero-shot, few-shot and context prompts,
for two generated personas. We found that our fine-tuned BERT models can easily
distinguish human-written comments from GPT-3.5 generated comments, with none
of the used prompting methods performing noticeably better. We further analyzed
that human comments consistently showed higher lexical diversity than
GPT-generated comments. This indicates that although generative LLMs can
generate fluent text, their capability to create human-like opinionated
comments is still limited.
Related papers
- Are Generative Language Models Multicultural? A Study on Hausa Culture and Emotions using ChatGPT [4.798444680860121]
We compare responses generated by ChatGPT with those provided by native Hausa speakers on 37 culturally relevant questions.
Our results show that ChatGPT has some level of similarity to human responses, but also exhibits some gaps and biases in its knowledge and awareness of the Hausa culture and emotions.
arXiv Detail & Related papers (2024-06-27T19:42:13Z) - Measuring Psychological Depth in Language Models [50.48914935872879]
We introduce the Psychological Depth Scale (PDS), a novel framework rooted in literary theory that measures an LLM's ability to produce authentic and narratively complex stories.
We empirically validate our framework by showing that humans can consistently evaluate stories based on PDS (0.72 Krippendorff's alpha)
Surprisingly, GPT-4 stories either surpassed or were statistically indistinguishable from highly-rated human-written stories sourced from Reddit.
arXiv Detail & Related papers (2024-06-18T14:51:54Z) - Investigating Wit, Creativity, and Detectability of Large Language Models in Domain-Specific Writing Style Adaptation of Reddit's Showerthoughts [17.369951848952265]
We investigate the ability of LLMs to replicate human writing style in short, creative texts in the domain of Showerthoughts.
We measure human preference on the texts across the specific dimensions that account for the quality of creative, witty texts.
We conclude that human evaluators rate the generated texts slightly worse on average regarding their creative quality, but they are unable to reliably distinguish between human-written and AI-generated texts.
arXiv Detail & Related papers (2024-05-02T18:29:58Z) - How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO [55.25989137825992]
We introduce ECHO, an evaluative framework inspired by the Turing test.
This framework engages the acquaintances of the target individuals to distinguish between human and machine-generated responses.
We evaluate three role-playing LLMs using ECHO, with GPT-3.5 and GPT-4 serving as foundational models.
arXiv Detail & Related papers (2024-04-22T08:00:51Z) - Human vs. LMMs: Exploring the Discrepancy in Emoji Interpretation and Usage in Digital Communication [68.40865217231695]
This study examines the behavior of GPT-4V in replicating human-like use of emojis.
The findings reveal a discernible discrepancy between human and GPT-4V behaviors, likely due to the subjective nature of human interpretation.
arXiv Detail & Related papers (2024-01-16T08:56:52Z) - ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine
Learning Model for Detecting Short ChatGPT-generated Text [2.0378492681344493]
We study whether a machine learning model can be effectively trained to accurately distinguish between original human and seemingly human (that is, ChatGPT-generated) text.
We employ an explainable artificial intelligence framework to gain insight into the reasoning behind the model trained to differentiate between ChatGPT-generated and human-generated text.
Our study focuses on short online reviews, conducting two experiments comparing human-generated and ChatGPT-generated text.
arXiv Detail & Related papers (2023-01-30T08:06:08Z) - How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation,
and Detection [8.107721810172112]
ChatGPT is able to respond effectively to a wide range of human questions.
People are starting to worry about the potential negative impacts that large language models (LLMs) like ChatGPT could have on society.
In this work, we collected tens of thousands of comparison responses from both human experts and ChatGPT.
arXiv Detail & Related papers (2023-01-18T15:23:25Z) - Elaboration-Generating Commonsense Question Answering at Scale [77.96137534751445]
In question answering requiring common sense, language models (e.g., GPT-3) have been used to generate text expressing background knowledge.
We finetune smaller language models to generate useful intermediate context, referred to here as elaborations.
Our framework alternates between updating two language models -- an elaboration generator and an answer predictor -- allowing each to influence the other.
arXiv Detail & Related papers (2022-09-02T18:32:09Z) - Reframing Human-AI Collaboration for Generating Free-Text Explanations [46.29832336779188]
We consider the task of generating free-text explanations using a small number of human-written examples.
We find that crowdworkers often prefer explanations generated by GPT-3 to crowdsourced human-written explanations.
We create a pipeline that combines GPT-3 with a supervised filter that incorporates humans-in-the-loop via binary acceptability judgments.
arXiv Detail & Related papers (2021-12-16T07:31:37Z) - All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated
Text [46.260544251940125]
We run a study assessing non-experts' ability to distinguish between human- and machine-authored text.
We find that, without training, evaluators distinguished between GPT3- and human-authored text at random chance level.
arXiv Detail & Related papers (2021-06-30T19:00:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.