Do AI assistants help students write formal specifications? A study with ChatGPT and the B-Method
- URL: http://arxiv.org/abs/2502.07789v1
- Date: Mon, 20 Jan 2025 13:00:29 GMT
- Title: Do AI assistants help students write formal specifications? A study with ChatGPT and the B-Method
- Authors: Alfredo Capozucca, Daniil Yampolskyi, Alexander Goldberg, Maximiliano Cristiá,
- Abstract summary: This paper investigates the role of AI assistants, specifically OpenAI's ChatGPT, in teaching formal methods to undergraduate students.
We examine whether ChatGPT provides an advantage when writing B-specifications and analyse student trust in its outputs.
Our findings indicate that the AI does not help students to enhance the correctness of their specifications, with low trust correlating to better outcomes.
- Score: 41.94295877935867
- License:
- Abstract: This paper investigates the role of AI assistants, specifically OpenAI's ChatGPT, in teaching formal methods (FM) to undergraduate students, using the B-method as a formal specification technique. While existing studies demonstrate the effectiveness of AI in coding tasks, no study reports on its impact on formal specifications. We examine whether ChatGPT provides an advantage when writing B-specifications and analyse student trust in its outputs. Our findings indicate that the AI does not help students to enhance the correctness of their specifications, with low trust correlating to better outcomes. Additionally, we identify a behavioural pattern with which to interact with ChatGPT which may influence the correctness of B-specifications.
Related papers
- Assessing instructor-AI cooperation for grading essay-type questions in an introductory sociology course [0.0]
We evaluate generative pre-trained transformers (GPT) models' performance in transcribing and scoring students' responses.
For grading, GPT demonstrated strong correlations with the human grader scores, especially when template answers were provided.
This study contributes to the growing literature on AI in education, demonstrating its potential to enhance fairness and efficiency in grading essay-type questions.
arXiv Detail & Related papers (2025-01-11T07:18:12Z) - ChatGPT in Veterinary Medicine: A Practical Guidance of Generative Artificial Intelligence in Clinics, Education, and Research [0.0]
This review concisely synthesizes the latest research and practical applications of ChatGPT within the clinical, educational, and research domains of veterinary medicine.
For practitioners, ChatGPT can extract patient data, generate progress notes, and potentially assist in diagnosing complex cases.
This review addresses ethical considerations, provides learning resources, and offers tangible examples to guide responsible implementation.
arXiv Detail & Related papers (2024-02-26T02:59:07Z) - Can ChatGPT Play the Role of a Teaching Assistant in an Introductory
Programming Course? [1.8197265299982013]
This paper explores the potential of using ChatGPT, an LLM, as a virtual Teaching Assistant (TA) in an introductory programming course.
We evaluate ChatGPT's capabilities by comparing its performance with that of human TAs in some of the important TA functions.
arXiv Detail & Related papers (2023-12-12T15:06:44Z) - Student Mastery or AI Deception? Analyzing ChatGPT's Assessment
Proficiency and Evaluating Detection Strategies [1.633179643849375]
Generative AI systems such as ChatGPT have a disruptive effect on learning and assessment.
This work investigates the performance of ChatGPT by evaluating it across three courses.
arXiv Detail & Related papers (2023-11-27T20:10:13Z) - Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples.
One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports.
Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z) - DEMASQ: Unmasking the ChatGPT Wordsmith [63.8746084667206]
We propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content.
Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods.
arXiv Detail & Related papers (2023-11-08T21:13:05Z) - HowkGPT: Investigating the Detection of ChatGPT-generated University
Student Homework through Context-Aware Perplexity Analysis [13.098764928946208]
HowkGPT is built upon a dataset of academic assignments and accompanying metadata.
It computes perplexity scores for student-authored and ChatGPT-generated responses.
It further refines its analysis by defining category-specific thresholds.
arXiv Detail & Related papers (2023-05-26T11:07:25Z) - To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection.
We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains.
Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z) - How Does In-Context Learning Help Prompt Tuning? [55.78535874154915]
Fine-tuning large language models is becoming ever more impractical due to their rapidly-growing scale.
This motivates the use of parameter-efficient adaptation methods such as prompt tuning (PT), which adds a small number of tunable embeddings to an otherwise frozen model.
Recently, Singhal et al. (2022) propose instruction prompt tuning'' (IPT), which combines PT with ICL by concatenating a natural language demonstration with learned prompt embeddings.
arXiv Detail & Related papers (2023-02-22T17:45:12Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.