Analyzing Large language models chatbots: An experimental approach using a probability test
- URL: http://arxiv.org/abs/2407.12862v1
- Date: Wed, 10 Jul 2024 15:49:40 GMT
- Title: Analyzing Large language models chatbots: An experimental approach using a probability test
- Authors: Melise Peruchini, Julio Monteiro Teixeira,
- Abstract summary: This study consists of qualitative empirical research, conducted through exploratory tests with two different Large Language Models (LLMs)
The methodological procedure involved exploratory tests based on prompts designed with a probability question.
The "Linda Problem", widely recognized in cognitive psychology, was used as a basis to create the tests, along with the development of a new problem specifically for this experiment, the "Mary Problem"
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study consists of qualitative empirical research, conducted through exploratory tests with two different Large Language Models (LLMs) chatbots: ChatGPT and Gemini. The methodological procedure involved exploratory tests based on prompts designed with a probability question. The "Linda Problem", widely recognized in cognitive psychology, was used as a basis to create the tests, along with the development of a new problem specifically for this experiment, the "Mary Problem". The object of analysis is the dataset with the outputs provided by each chatbot interaction. The purpose of the analysis is to verify whether the chatbots mainly employ logical reasoning that aligns with probability theory or if they are more frequently affected by the stereotypical textual descriptions in the prompts. The findings provide insights about the approach each chatbot employs in handling logic and textual constructions, suggesting that, while the analyzed chatbots perform satisfactorily on a well-known probabilistic problem, they exhibit significantly lower performance on new tests that require direct application of probabilistic logic.
Related papers
- Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance.
We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z) - Empirical Study of Symmetrical Reasoning in Conversational Chatbots [0.0]
This work explores the capability of conversational chatbots powered by large language models (LLMs) to understand predicate symmetry.
We assess the symmetrical reasoning of five chatbots: ChatGPT 4, Huggingface chat AI, Microsoft's Copilot AI, LLaMA through Perplexity, and Gemini Advanced.
Experiment results reveal varied performance among chatbots, with some approaching human-like reasoning capabilities.
arXiv Detail & Related papers (2024-07-08T08:38:43Z) - Unveiling Assumptions: Exploring the Decisions of AI Chatbots and Human Testers [2.5327705116230477]
Decision-making relies on a variety of information, including code, requirements specifications, and other software artifacts.
To fill in the gaps left by unclear information, we often rely on assumptions, intuition, or previous experiences to make decisions.
arXiv Detail & Related papers (2024-06-17T08:55:56Z) - Comparative Analysis of ChatGPT, GPT-4, and Microsoft Bing Chatbots for GRE Test [0.0]
This research paper presents an analysis of how well three artificial intelligence chatbots: Bing, ChatGPT, and GPT-4, perform when answering questions from standardized tests.
A total of 137 questions with different forms of quantitative reasoning and 157 questions with verbal categories were used to assess their capabilities.
arXiv Detail & Related papers (2023-11-26T05:27:35Z) - Multi-Purpose NLP Chatbot : Design, Methodology & Conclusion [0.0]
This research paper provides a thorough analysis of the chatbots technology environment as it exists today.
It provides a very flexible system that makes use of reinforcement learning strategies to improve user interactions and conversational experiences.
The complexity of chatbots technology development is also explored in this study, along with the causes that have propelled these developments and their far-reaching effects on a range of sectors.
arXiv Detail & Related papers (2023-10-13T09:47:24Z) - Chat2Brain: A Method for Mapping Open-Ended Semantic Queries to Brain
Activation Maps [59.648646222905235]
We propose a method called Chat2Brain that combines LLMs to basic text-2-image model, known as Text2Brain, to map semantic queries to brain activation maps.
We demonstrate that Chat2Brain can synthesize plausible neural activation patterns for more complex tasks of text queries.
arXiv Detail & Related papers (2023-09-10T13:06:45Z) - Naturalistic Causal Probing for Morpho-Syntax [76.83735391276547]
We suggest a naturalistic strategy for input-level intervention on real world data in Spanish.
Using our approach, we isolate morpho-syntactic features from counfounders in sentences.
We apply this methodology to analyze causal effects of gender and number on contextualized representations extracted from pre-trained models.
arXiv Detail & Related papers (2022-05-14T11:47:58Z) - Do language models learn typicality judgments from text? [6.252236971703546]
We evaluate predictive language models (LMs) on a prevalent phenomenon in cognitive science: typicality.
Our first test targets whether typicality modulates LMs in assigning taxonomic category memberships to items.
The second test investigates sensitivities to typicality in LMs' probabilities when extending new information about items to their categories.
arXiv Detail & Related papers (2021-05-06T21:56:40Z) - AR-LSAT: Investigating Analytical Reasoning of Text [57.1542673852013]
We study the challenge of analytical reasoning of text and introduce a new dataset consisting of questions from the Law School Admission Test from 1991 to 2016.
We analyze what knowledge understanding and reasoning abilities are required to do well on this task.
arXiv Detail & Related papers (2021-04-14T02:53:32Z) - Investigation of Sentiment Controllable Chatbot [50.34061353512263]
In this paper, we investigate four models to scale or adjust the sentiment of the response.
The models are a persona-based model, reinforcement learning, a plug and play model, and CycleGAN.
We develop machine-evaluated metrics to estimate whether the responses are reasonable given the input.
arXiv Detail & Related papers (2020-07-11T16:04:30Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.