Application of Large Language Models in Automated Question Generation: A Case Study on ChatGLM's Structured Questions for National Teacher Certification Exams
- URL: http://arxiv.org/abs/2408.09982v2
- Date: Tue, 20 Aug 2024 02:41:13 GMT
- Title: Application of Large Language Models in Automated Question Generation: A Case Study on ChatGLM's Structured Questions for National Teacher Certification Exams
- Authors: Ling He, Yanxin Chen, Xiaoqiang Hu,
- Abstract summary: This study explores the application potential of the large language models (LLMs) ChatGLM in the automatic generation of structured questions for National Teacher Certification Exams (NTCE)
We guided ChatGLM to generate a series of simulated questions and conducted a comprehensive comparison with questions recollected from past examinees.
The research results indicate that the questions generated by ChatGLM exhibit a high level of rationality, scientificity, and practicality similar to those of the real exam questions.
- Score: 2.7363336723930756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study delves into the application potential of the large language models (LLMs) ChatGLM in the automatic generation of structured questions for National Teacher Certification Exams (NTCE). Through meticulously designed prompt engineering, we guided ChatGLM to generate a series of simulated questions and conducted a comprehensive comparison with questions recollected from past examinees. To ensure the objectivity and professionalism of the evaluation, we invited experts in the field of education to assess these questions and their scoring criteria. The research results indicate that the questions generated by ChatGLM exhibit a high level of rationality, scientificity, and practicality similar to those of the real exam questions across most evaluation criteria, demonstrating the model's accuracy and reliability in question generation. Nevertheless, the study also reveals limitations in the model's consideration of various rating criteria when generating questions, suggesting the need for further optimization and adjustment. This research not only validates the application potential of ChatGLM in the field of educational assessment but also provides crucial empirical support for the development of more efficient and intelligent educational automated generation systems in the future.
Related papers
- A Novel Psychometrics-Based Approach to Developing Professional Competency Benchmark for Large Language Models [0.0]
We propose a comprehensive approach to benchmark development based on rigorous psychometric principles.
We make the first attempt to illustrate this approach by creating a new benchmark in the field of pedagogy and education.
We construct a novel benchmark guided by the Bloom's taxonomy and rigorously designed by a consortium of education experts trained in test development.
arXiv Detail & Related papers (2024-10-29T19:32:43Z) - AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs [53.6200736559742]
AGENT-CQ consists of two stages: a generation stage and an evaluation stage.
CrowdLLM simulates human crowdsourcing judgments to assess generated questions and answers.
Experiments on the ClariQ dataset demonstrate CrowdLLM's effectiveness in evaluating question and answer quality.
arXiv Detail & Related papers (2024-10-25T17:06:27Z) - Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents [61.41316121093604]
We present InsCoQA, a novel benchmark for evaluating large language models (LLMs) in the context of conversational question answering (CQA)
Sourced from extensive, encyclopedia-style instructional content, InsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents.
We also propose InsEval, an LLM-assisted evaluator that measures the integrity and accuracy of generated responses and procedural instructions.
arXiv Detail & Related papers (2024-10-01T09:10:00Z) - Research on the Application of Large Language Models in Automatic Question Generation: A Case Study of ChatGLM in the Context of High School Information Technology Curriculum [3.0753648264454547]
The model is guided to generate diverse questions, which are then comprehensively evaluated by domain experts.
The results indicate that ChatGLM outperforms human-generated questions in terms of clarity and teachers' willingness to use.
arXiv Detail & Related papers (2024-08-21T11:38:32Z) - Evaluating the Generation Capabilities of Large Chinese Language Models [27.598864484231477]
This paper unveils CG-Eval, the first-ever comprehensive and automated evaluation framework.
It assesses the generative capabilities of large Chinese language models across a spectrum of academic disciplines.
Gscore automates the quality measurement of a model's text generation against reference standards.
arXiv Detail & Related papers (2023-08-09T09:22:56Z) - Benchmarking Foundation Models with Language-Model-as-an-Examiner [47.345760054595246]
We propose a novel benchmarking framework, Language-Model-as-an-Examiner.
The LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner.
arXiv Detail & Related papers (2023-06-07T06:29:58Z) - Evaluating the Performance of Large Language Models on GAOKAO Benchmark [53.663757126289795]
This paper introduces GAOKAO-Bench, an intuitive benchmark that employs questions from the Chinese GAOKAO examination as test samples.
With human evaluation, we obtain the converted total score of LLMs, including GPT-4, ChatGPT and ERNIE-Bot.
We also use LLMs to grade the subjective questions, and find that model scores achieve a moderate level of consistency with human scores.
arXiv Detail & Related papers (2023-05-21T14:39:28Z) - Reinforcement Learning Guided Multi-Objective Exam Paper Generation [21.945655389912112]
We propose a reinforcement learning guided Multi-Objective Exam Paper Generation framework, termed MOEPG.
It simultaneously optimize three exam domain-specific objectives including difficulty degree, distribution of exam scores, and skill coverage.
We show that MOEPG is feasible in addressing the multiple dilemmas of exam paper generation scenario.
arXiv Detail & Related papers (2023-03-02T07:55:52Z) - What should I Ask: A Knowledge-driven Approach for Follow-up Questions
Generation in Conversational Surveys [63.51903260461746]
We propose a novel task for knowledge-driven follow-up question generation in conversational surveys.
We constructed a new human-annotated dataset of human-written follow-up questions with dialogue history and labeled knowledge.
We then propose a two-staged knowledge-driven model for the task, which generates informative and coherent follow-up questions.
arXiv Detail & Related papers (2022-05-23T00:57:33Z) - Quiz Design Task: Helping Teachers Create Quizzes with Automated
Question Generation [87.34509878569916]
This paper focuses on the use case of helping teachers automate the generation of reading comprehension quizzes.
In our study, teachers building a quiz receive question suggestions, which they can either accept or refuse with a reason.
arXiv Detail & Related papers (2022-05-03T18:59:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.