QACP: An Annotated Question Answering Dataset for Assisting Chinese
Python Programming Learners
- URL: http://arxiv.org/abs/2402.07913v2
- Date: Fri, 23 Feb 2024 02:35:41 GMT
- Title: QACP: An Annotated Question Answering Dataset for Assisting Chinese
Python Programming Learners
- Authors: Rui Xiao, Lu Han, Xiaoying Zhou, Jiong Wang, Na Zong, Pengyu Zhang
- Abstract summary: This paper proposes a new Chinese question-and-answer dataset for Python learners.
It is designed to enhance the effectiveness and quality of online programming education.
- Score: 10.90557801193242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In online learning platforms, particularly in rapidly growing computer
programming courses, addressing the thousands of students' learning queries
requires considerable human cost. The creation of intelligent assistant large
language models (LLMs) tailored for programming education necessitates distinct
data support. However, in real application scenarios, the data resources for
training such LLMs are relatively scarce. Therefore, to address the data
scarcity in intelligent educational systems for programming, this paper
proposes a new Chinese question-and-answer dataset for Python learners. To
ensure the authenticity and reliability of the sources of the questions, we
collected questions from actual student questions and categorized them
according to various dimensions such as the type of questions and the type of
learners. This annotation principle is designed to enhance the effectiveness
and quality of online programming education, providing a solid data foundation
for developing the programming teaching assists (TA). Furthermore, we conducted
comprehensive evaluations of various LLMs proficient in processing and
generating Chinese content, highlighting the potential limitations of general
LLMs as intelligent teaching assistants in computer programming courses.
Related papers
- YuLan: An Open-source Large Language Model [179.59272970659677]
This paper presents the development of YuLan, a series of open-source large language models (LLMs) with $12$ billion parameters.
The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts.
We devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner.
arXiv Detail & Related papers (2024-06-28T11:52:53Z) - Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever [48.5585921817745]
Large Language Models (LLMs) are used to automate the knowledge tagging task.
We show the strong performance of zero- and few-shot results over math questions knowledge tagging tasks.
By proposing a reinforcement learning-based demonstration retriever, we successfully exploit the great potential of different-sized LLMs.
arXiv Detail & Related papers (2024-06-19T23:30:01Z) - Cross-Data Knowledge Graph Construction for LLM-enabled Educational Question-Answering System: A~Case~Study~at~HCMUT [2.8000537365271367]
Large language models (LLMs) have emerged as a vibrant research topic.
LLMs face challenges in remembering events, incorporating new information, and addressing domain-specific issues or hallucinations.
This article proposes a method for automatically constructing a Knowledge Graph from multiple data sources.
arXiv Detail & Related papers (2024-04-14T16:34:31Z) - CSEPrompts: A Benchmark of Introductory Computer Science Prompts [11.665831944836118]
Recent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs)
Commercial applications have made this technology available to the general public, thus making it possible to use LLMs to produce high-quality texts for academic and professional purposes.
Schools and universities are aware of the increasing use of AI-generated content by students and they have been researching the impact of this new technology and its potential misuse.
arXiv Detail & Related papers (2024-04-03T07:55:57Z) - COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning [57.600941792026006]
We introduce COIG-CQIA, a high-quality Chinese instruction tuning dataset.
Our aim is to build a diverse, wide-ranging instruction-tuning dataset to better align model behavior with human interactions.
We train models of various scales on different subsets of CQIA, following in-depth evaluation and analyses.
arXiv Detail & Related papers (2024-03-26T19:24:18Z) - Automate Knowledge Concept Tagging on Math Questions with LLMs [48.5585921817745]
Knowledge concept tagging for questions plays a crucial role in contemporary intelligent educational applications.
Traditionally, these annotations have been conducted manually with help from pedagogical experts.
In this paper, we explore the automating the tagging task using Large Language Models (LLMs)
arXiv Detail & Related papers (2024-03-26T00:09:38Z) - DIALIGHT: Lightweight Multilingual Development and Evaluation of
Task-Oriented Dialogue Systems with Large Language Models [76.79929883963275]
DIALIGHT is a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems.
It features a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level.
Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses.
arXiv Detail & Related papers (2024-01-04T11:27:48Z) - Exploring the Potential of Large Language Models in Generating
Code-Tracing Questions for Introductory Programming Courses [6.43363776610849]
Large language models (LLMs) can be used to generate code-tracing questions in programming courses.
We present a dataset of human and LLM-generated tracing questions, serving as a valuable resource for both the education and NLP research communities.
arXiv Detail & Related papers (2023-10-23T19:35:01Z) - CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
Language Models in 167 Languages [86.90220551111096]
Training datasets for large language models (LLMs) are often not fully disclosed.
We present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages.
arXiv Detail & Related papers (2023-09-17T23:49:10Z) - Tag Prediction of Competitive Programming Problems using Deep Learning
Techniques [0.0]
A well-liked method for developing programming abilities is competitive programming.
It can be tough for novices and even veteran programmers to traverse the wide collection of questions.
This can be done using automated tagging of the questions using Text Classification.
arXiv Detail & Related papers (2023-08-03T16:39:02Z) - Leveraging Large Language Model and Story-Based Gamification in
Intelligent Tutoring System to Scaffold Introductory Programming Courses: A
Design-Based Research Study [6.773393436953262]
This study explores how large language models and.
gamblers can scaffold coding learning and increase.
Chinese students sense of belonging in introductory programming courses.
arXiv Detail & Related papers (2023-02-25T04:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.