ChatGPT v Bard v Bing v Claude 2 v Aria v human-expert. How good are AI
chatbots at scientific writing?
- URL: http://arxiv.org/abs/2309.08636v3
- Date: Mon, 16 Oct 2023 14:24:02 GMT
- Title: ChatGPT v Bard v Bing v Claude 2 v Aria v human-expert. How good are AI
chatbots at scientific writing?
- Authors: Edisa Lozi\'c and Benjamin \v{S}tular
- Abstract summary: ChatGPT-4 showed the highest quantitative accuracy, closely followed by ChatGPT-3.5, Bing, and Bard.
All AIs exhibited proficiency in merging existing knowledge, but none produced original scientific content.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Historical emphasis on writing mastery has shifted with advances in
generative AI, especially in scientific writing. This study analysed six AI
chatbots for scholarly writing in humanities and archaeology. Using methods
that assessed factual correctness and scientific contribution, ChatGPT-4 showed
the highest quantitative accuracy, closely followed by ChatGPT-3.5, Bing, and
Bard. However, Claude 2 and Aria scored considerably lower. Qualitatively, all
AIs exhibited proficiency in merging existing knowledge, but none produced
original scientific content. Inter-estingly, our findings suggest ChatGPT-4
might represent a plateau in large language model size. This research
emphasizes the unique, intricate nature of human research, suggesting that AI's
emulation of human originality in scientific writing is challenging. As of
2023, while AI has transformed content generation, it struggles with original
contributions in humanities. This may change as AI chatbots continue to evolve
into LLM-powered software.
Related papers
- Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing [55.2480439325792]
Misclassification can lead to false plagiarism accusations and misleading claims about AI prevalence in online content.
We systematically evaluate eleven state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation dataset.
Our findings reveal that detectors frequently misclassify even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models.
arXiv Detail & Related papers (2025-02-21T18:45:37Z) - Evaluating Sakana's AI Scientist for Autonomous Research: Wishful Thinking or an Emerging Reality Towards 'Artificial Research Intelligence' (ARI)? [19.524056927240498]
Sakana recently introduced the 'AI Scientist', claiming to conduct research autonomously, i.e. they imply to have achieved what we term Artificial Research Intelligence (ARI)
Our evaluation of the AI Scientist reveals critical shortcomings.
arXiv Detail & Related papers (2025-02-20T06:22:03Z) - Human Bias in the Face of AI: The Role of Human Judgement in AI Generated Text Evaluation [48.70176791365903]
This study explores how bias shapes the perception of AI versus human generated content.
We investigated how human raters respond to labeled and unlabeled content.
arXiv Detail & Related papers (2024-09-29T04:31:45Z) - Empirical evidence of Large Language Model's influence on human spoken communication [25.09136621615789]
Artificial Intelligence (AI) agents now interact with billions of humans in natural language.
This raises the question of whether AI has the potential to shape a fundamental aspect of human culture: the way we speak.
Recent analyses revealed that scientific publications already exhibit evidence of AI-specific language.
arXiv Detail & Related papers (2024-09-03T10:01:51Z) - ConvXAI: Delivering Heterogeneous AI Explanations via Conversations to
Support Human-AI Scientific Writing [45.187790784934734]
This paper focuses on Conversational XAI for AI-assisted scientific writing tasks.
We identify four design rationales: "multifaceted", "controllability", "mix-initiative", "context-aware drill-down"
We incorporate them into an interactive prototype, ConvXAI, which facilitates heterogeneous AI explanations for scientific writing through dialogue.
arXiv Detail & Related papers (2023-05-16T19:48:49Z) - AI, write an essay for me: A large-scale comparison of human-written
versus ChatGPT-generated essays [66.36541161082856]
ChatGPT and similar generative AI models have attracted hundreds of millions of users.
This study compares human-written versus ChatGPT-generated argumentative student essays.
arXiv Detail & Related papers (2023-04-24T12:58:28Z) - One Small Step for Generative AI, One Giant Leap for AGI: A Complete
Survey on ChatGPT in AIGC Era [95.2284704286191]
GPT-4 (a.k.a. ChatGPT plus) is one small step for generative AI (GAI) but one giant leap for artificial general intelligence (AGI)
Since its official release in November 2022, ChatGPT has quickly attracted numerous users with extensive media coverage.
This work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges.
arXiv Detail & Related papers (2023-04-04T06:22:09Z) - Towards Healthy AI: Large Language Models Need Therapists Too [41.86344997530743]
We define Healthy AI to be safe, trustworthy and ethical.
We present the SafeguardGPT framework that uses psychotherapy to correct for these harmful behaviors.
arXiv Detail & Related papers (2023-04-02T00:39:12Z) - ChatGPT or academic scientist? Distinguishing authorship with over 99%
accuracy using off-the-shelf machine learning tools [0.0]
ChatGPT has enabled access to AI-generated writing for the masses.
Need to discriminate human writing from AI is now both critical and urgent.
We developed a method for discriminating text generated by ChatGPT from (human) academic scientists.
arXiv Detail & Related papers (2023-03-28T23:16:00Z) - A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to
GPT-5 All You Need? [112.12974778019304]
generative AI (AIGC, a.k.a AI-generated content) has made headlines everywhere because of its ability to analyze and create text, images, and beyond.
In the era of AI transitioning from pure analysis to creation, it is worth noting that ChatGPT, with its most recent language model GPT-4, is just a tool out of numerous AIGC tasks.
This work focuses on the technological development of various AIGC tasks based on their output type, including text, images, videos, 3D content, etc.
arXiv Detail & Related papers (2023-03-21T10:09:47Z) - Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being.
For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.