Creating Large Language Model Resistant Exams: Guidelines and Strategies
- URL: http://arxiv.org/abs/2304.12203v1
- Date: Tue, 18 Apr 2023 18:01:32 GMT
- Title: Creating Large Language Model Resistant Exams: Guidelines and Strategies
- Authors: Simon kaare Larsen
- Abstract summary: Large Language Models (LLMs) have raised concerns about their potential impact on academic integrity.
This article investigates the performance of LLMs on exams and their implications for assessment.
We propose guidelines for creating LLM-resistant exams, including content moderation, deliberate inaccuracies, real-world scenarios beyond the model's knowledge base, effective distractor options, evaluating soft skills, and incorporating non-textual information.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The proliferation of Large Language Models (LLMs), such as ChatGPT, has
raised concerns about their potential impact on academic integrity, prompting
the need for LLM-resistant exam designs. This article investigates the
performance of LLMs on exams and their implications for assessment, focusing on
ChatGPT's abilities and limitations. We propose guidelines for creating
LLM-resistant exams, including content moderation, deliberate inaccuracies,
real-world scenarios beyond the model's knowledge base, effective distractor
options, evaluating soft skills, and incorporating non-textual information. The
article also highlights the significance of adapting assessments to modern
tools and promoting essential skills development in students. By adopting these
strategies, educators can maintain academic integrity while ensuring that
assessments accurately reflect contemporary professional settings and address
the challenges and opportunities posed by artificial intelligence in education.
Related papers
- A Novel Psychometrics-Based Approach to Developing Professional Competency Benchmark for Large Language Models [0.0]
We propose a comprehensive approach to benchmark development based on rigorous psychometric principles.
We make the first attempt to illustrate this approach by creating a new benchmark in the field of pedagogy and education.
We construct a novel benchmark guided by the Bloom's taxonomy and rigorously designed by a consortium of education experts trained in test development.
arXiv Detail & Related papers (2024-10-29T19:32:43Z) - Evaluating Copyright Takedown Methods for Language Models [100.38129820325497]
Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material.
This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs.
We examine several strategies, including adding system prompts, decoding-time filtering interventions, and unlearning approaches.
arXiv Detail & Related papers (2024-06-26T18:09:46Z) - Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark [12.729687989535359]
evaluating Large Language Models (LLMs) in languages other than English is crucial for ensuring their linguistic versatility, cultural relevance, and applicability in diverse global contexts.
We tackle this challenge by introducing a structured benchmark using the INVALSI tests, a set of well-established assessments designed to measure educational competencies across Italy.
arXiv Detail & Related papers (2024-06-25T13:20:08Z) - Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress.
Our investigation exposes a critical oversight in this belief.
By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z) - Rethinking Machine Unlearning for Large Language Models [85.92660644100582]
We explore machine unlearning in the domain of large language models (LLMs)
This initiative aims to eliminate undesirable data influence (e.g., sensitive or illegal information) and the associated model capabilities.
arXiv Detail & Related papers (2024-02-13T20:51:58Z) - Best Practices for Text Annotation with Large Language Models [11.421942894219901]
Large Language Models (LLMs) have ushered in a new era of text annotation.
This paper proposes a comprehensive set of standards and best practices for their reliable, reproducible, and ethical use.
arXiv Detail & Related papers (2024-02-05T15:43:50Z) - Data Poisoning for In-context Learning [49.77204165250528]
In-context learning (ICL) has been recognized for its innovative ability to adapt to new tasks.
This paper delves into the critical issue of ICL's susceptibility to data poisoning attacks.
We introduce ICLPoison, a specialized attacking framework conceived to exploit the learning mechanisms of ICL.
arXiv Detail & Related papers (2024-02-03T14:20:20Z) - EpiK-Eval: Evaluation for Language Models as Epistemic Models [16.485951373967502]
We introduce EpiK-Eval, a novel question-answering benchmark tailored to evaluate LLMs' proficiency in formulating a coherent and consistent knowledge representation from segmented narratives.
We argue that these shortcomings stem from the intrinsic nature of prevailing training objectives.
The findings from this study offer insights for developing more robust and reliable LLMs.
arXiv Detail & Related papers (2023-10-23T21:15:54Z) - Large Language Models Cannot Self-Correct Reasoning Yet [78.16697476530994]
Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities.
Concerns persist regarding the accuracy and appropriateness of their generated content.
A contemporary methodology, self-correction, has been proposed as a remedy to these issues.
arXiv Detail & Related papers (2023-10-03T04:56:12Z) - Automatically Correcting Large Language Models: Surveying the landscape
of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks.
A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output.
This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z) - HowkGPT: Investigating the Detection of ChatGPT-generated University
Student Homework through Context-Aware Perplexity Analysis [13.098764928946208]
HowkGPT is built upon a dataset of academic assignments and accompanying metadata.
It computes perplexity scores for student-authored and ChatGPT-generated responses.
It further refines its analysis by defining category-specific thresholds.
arXiv Detail & Related papers (2023-05-26T11:07:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.