Related papers: ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

URL: http://arxiv.org/abs/2303.16421v3
Date: Fri, 19 Apr 2024 04:57:37 GMT
Title: ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models
Authors: Ning Bian, Xianpei Han, Le Sun, Hongyu Lin, Yaojie Lu, Ben He, Shanshan Jiang, Bin Dong,
Abstract summary: Large language models (LLMs) have made significant progress in NLP. We specifically focus on ChatGPT, a widely used and easily accessible LLM. We conduct a series of experiments on 11 datasets to evaluate ChatGPT's commonsense abilities.
Score: 49.52083248451775
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have made significant progress in NLP. However, their ability to memorize, represent, and leverage commonsense knowledge has been a well-known pain point. In this paper, we specifically focus on ChatGPT, a widely used and easily accessible LLM, and ask the following questions: (1) Can ChatGPT effectively answer commonsense questions? (2) Is ChatGPT aware of the underlying commonsense knowledge for answering a specific question? (3) Is ChatGPT knowledgeable in commonsense? (4) Can ChatGPT effectively leverage commonsense for answering questions? We conduct a series of experiments on 11 datasets to evaluate ChatGPT's commonsense abilities, including answering commonsense questions, identifying necessary knowledge, generating knowledge descriptions, and using knowledge descriptions to answer questions again. Experimental results show that: (1) ChatGPT can achieve good QA accuracies in commonsense tasks, while still struggling with certain domains of datasets. (2) ChatGPT is knowledgeable, and can accurately generate most of the commonsense knowledge using knowledge prompts. (3) Despite its knowledge, ChatGPT is an inexperienced commonsense problem solver, which cannot precisely identify the needed commonsense for answering a specific question. These findings raise the need to explore improved mechanisms for effectively incorporating commonsense into LLMs like ChatGPT, such as better instruction following and commonsense guidance.

Related papers

Information Needs and Practices Supported by ChatGPT [7.50584818763022]
This study investigates the information needs that people come to ChatGPT with and the information practices that ChatGPT supports.<n>The findings show that ChatGPT is used in a range of life domains and for a range of human needs.<n>In the AI age, information need should be conceptualized as skillfully coping in the world, a notion that includes both understanding and action.
arXiv Detail & Related papers (2025-07-07T23:21:20Z)
Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples. One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports. Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z)
Primacy Effect of ChatGPT [69.49920102917598]
We study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer. We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions.
arXiv Detail & Related papers (2023-10-20T00:37:28Z)
An empirical study of ChatGPT-3.5 on question answering and code maintenance [14.028497274245227]
A rising concern is whether ChatGPT will replace programmers and kill jobs. We conducted an empirical study to systematically compare ChatGPT against programmers in question-answering and software-maintaining.
arXiv Detail & Related papers (2023-10-03T14:48:32Z)
Performance of ChatGPT on USMLE: Unlocking the Potential of Large Language Models for AI-Assisted Medical Education [0.0]
This study determined how reliable ChatGPT can be for answering complex medical and clinical questions. The paper evaluated the obtained results using a 2-way ANOVA and posthoc analysis. ChatGPT-generated answers were found to be more context-oriented than regular Google search results.
arXiv Detail & Related papers (2023-06-30T19:53:23Z)
ChatGPT: A Study on its Utility for Ubiquitous Software Engineering Tasks [2.084078990567849]
ChatGPT (Chat Generative Pre-trained Transformer) launched by OpenAI on November 30, 2022. In this study, we explore how ChatGPT can be used to help with common software engineering tasks.
arXiv Detail & Related papers (2023-05-26T11:29:06Z)
Transformative Effects of ChatGPT on Modern Education: Emerging Era of AI Chatbots [36.760677949631514]
ChatGPT was released to provide coherent and useful replies based on analysis of large volumes of data. Our preliminary evaluation concludes that ChatGPT performed differently in each subject area including finance, coding and maths. There are clear drawbacks in its use, such as the possibility of producing inaccurate or false data. Academic regulations and evaluation practices need to be updated, should ChatGPT be used as a tool in education.
arXiv Detail & Related papers (2023-05-25T17:35:57Z)
Why Does ChatGPT Fall Short in Providing Truthful Answers? [31.656442655938445]
We investigate ChatGPT's failures in providing truthful answers to user questions. We identify two critical abilities associated with factuality: knowledge memorization and knowledge recall. Our findings suggest that augmenting the model with granular external knowledge and cues for knowledge recall can enhance the model's factuality in answering questions.
arXiv Detail & Related papers (2023-04-20T17:48:43Z)
Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries. We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models. We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z)
Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community. It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)
CIKQA: Learning Commonsense Inference with a Unified Knowledge-in-the-loop QA Paradigm [120.98789964518562]
We argue that due to the large scale of commonsense knowledge, it is infeasible to annotate a large enough training set for each task to cover all commonsense for learning. We focus on investigating models' commonsense inference capabilities from two perspectives. We name the benchmark as Commonsense Inference with Knowledge-in-the-loop Question Answering (CIKQA)
arXiv Detail & Related papers (2022-10-12T14:32:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.