On Exams with the Isabelle Proof Assistant
- URL: http://arxiv.org/abs/2303.05866v1
- Date: Fri, 10 Mar 2023 11:37:09 GMT
- Title: On Exams with the Isabelle Proof Assistant
- Authors: Frederik Krogsdal Jacobsen (Technical University of Denmark),
J{\o}rgen Villadsen (Technical University of Denmark)
- Abstract summary: We present an approach for testing student learning outcomes in a course on automated reasoning using the Isabelle proof assistant.
The approach allows us to test both general understanding of formal proofs in various logical proof systems and understanding of proofs in the higher-order logic of Isabelle/HOL.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present an approach for testing student learning outcomes in a course on
automated reasoning using the Isabelle proof assistant. The approach allows us
to test both general understanding of formal proofs in various logical proof
systems and understanding of proofs in the higher-order logic of Isabelle/HOL
in particular. The use of Isabelle enables almost automatic grading of large
parts of the exam. We explain our approach through a number of example
problems, and explain why we believe that each of the kinds of problems we have
selected are adequate measures of our intended learning outcomes. Finally, we
discuss our experiences using the approach for the exam of a course on
automated reasoning and suggest potential future work.
Related papers
- MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs [80.96119560172224]
Large language models (LLMs) can solve arithmetic word problems with high accuracy, but little is known about how well they generalize to problems that are more complex than the ones on which they have been trained.
We present a framework for evaluating LLMs on problems with arbitrarily complex arithmetic proofs, called MathGAP.
arXiv Detail & Related papers (2024-10-17T12:48:14Z) - Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications.
The quality of these exemplars in the prompt greatly impacts performance.
Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z) - Automatic question generation for propositional logical equivalences [6.221146613622175]
We develop and implement a method capable of generating tailored questions for each student.
Previous studies have investigated AQG frameworks in education, which include validity, user-defined difficulty, and personalized problem generation.
Our new AQG approach produces logical equivalence problems for Discrete Mathematics, which is a core course for year-one computer science students.
arXiv Detail & Related papers (2024-05-09T02:44:42Z) - Teaching Higher-Order Logic Using Isabelle [0.0]
We present a formalization of higher-order logic in the Isabelle proof assistant.
It should serve as a good introduction for someone looking into learning about higher-order logic and proof assistants.
arXiv Detail & Related papers (2024-04-08T12:40:27Z) - Learning Guided Automated Reasoning: A Brief Survey [5.607616497275423]
We provide an overview of several automated reasoning and theorem proving domains and the learning and AI methods that have been so far developed for them.
These include premise selection, proof guidance in several settings, feedback loops iterating between reasoning and learning, and symbolic classification problems.
arXiv Detail & Related papers (2024-03-06T19:59:17Z) - ProofBuddy: A Proof Assistant for Learning and Monitoring [0.0]
Proof competence, i.e. the ability to write and check (mathematical) proofs, is an important skill in Computer Science.
The main issues are the correct use of formal language and the ascertainment of whether proofs, especially the students' own, are complete and correct.
Many authors have suggested using proof assistants to assist in teaching proof competence, but the efficacy of the approach is unclear.
We have performed a preliminary usability study of ProofBuddy at the Technical University of Denmark.
arXiv Detail & Related papers (2023-08-14T07:08:55Z) - UKP-SQuARE: An Interactive Tool for Teaching Question Answering [61.93372227117229]
The exponential growth of question answering (QA) has made it an indispensable topic in any Natural Language Processing (NLP) course.
We introduce UKP-SQuARE as a platform for QA education.
Students can run, compare, and analyze various QA models from different perspectives.
arXiv Detail & Related papers (2023-05-31T11:29:04Z) - elBERto: Self-supervised Commonsense Learning for Question Answering [131.51059870970616]
We propose a Self-supervised Bidirectional Representation Learning of Commonsense framework, which is compatible with off-the-shelf QA model architectures.
The framework comprises five self-supervised tasks to force the model to fully exploit the additional training signals from contexts containing rich commonsense.
elBERto achieves substantial improvements on out-of-paragraph and no-effect questions where simple lexical similarity comparison does not help.
arXiv Detail & Related papers (2022-03-17T16:23:45Z) - Formal Mathematics Statement Curriculum Learning [64.45821687940946]
We show that at same compute budget, expert iteration, by which we mean proof search interleaved with learning, dramatically outperforms proof search only.
We also observe that when applied to a collection of formal statements of sufficiently varied difficulty, expert iteration is capable of finding and solving a curriculum of increasingly difficult problems.
arXiv Detail & Related papers (2022-02-03T00:17:00Z) - Online Active Model Selection for Pre-trained Classifiers [72.84853880948894]
We design an online selective sampling approach that actively selects informative examples to label and outputs the best model with high probability at any round.
Our algorithm can be used for online prediction tasks for both adversarial and streams.
arXiv Detail & Related papers (2020-10-19T19:53:15Z) - Simple Dataset for Proof Method Recommendation in Isabelle/HOL (Dataset
Description) [6.85316573653194]
We present a simple dataset that contains data on over 400k proof method applications along with over 100 extracted features for each.
Our simple data format allows machine learning practitioners to try machine learning tools to predict proof methods in Isabelle/HOL without requiring domain expertise in logic.
arXiv Detail & Related papers (2020-04-21T12:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.