Related papers: QACP: An Annotated Question Answering Dataset for Assisting Chinese Python Programming Learners

QACP: An Annotated Question Answering Dataset for Assisting Chinese Python Programming Learners

URL: http://arxiv.org/abs/2402.07913v2
Date: Fri, 23 Feb 2024 02:35:41 GMT
Title: QACP: An Annotated Question Answering Dataset for Assisting Chinese Python Programming Learners
Authors: Rui Xiao, Lu Han, Xiaoying Zhou, Jiong Wang, Na Zong, Pengyu Zhang
Abstract summary: This paper proposes a new Chinese question-and-answer dataset for Python learners. It is designed to enhance the effectiveness and quality of online programming education.
Score: 10.90557801193242
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In online learning platforms, particularly in rapidly growing computer programming courses, addressing the thousands of students' learning queries requires considerable human cost. The creation of intelligent assistant large language models (LLMs) tailored for programming education necessitates distinct data support. However, in real application scenarios, the data resources for training such LLMs are relatively scarce. Therefore, to address the data scarcity in intelligent educational systems for programming, this paper proposes a new Chinese question-and-answer dataset for Python learners. To ensure the authenticity and reliability of the sources of the questions, we collected questions from actual student questions and categorized them according to various dimensions such as the type of questions and the type of learners. This annotation principle is designed to enhance the effectiveness and quality of online programming education, providing a solid data foundation for developing the programming teaching assists (TA). Furthermore, we conducted comprehensive evaluations of various LLMs proficient in processing and generating Chinese content, highlighting the potential limitations of general LLMs as intelligent teaching assistants in computer programming courses.

Related papers

Design of AI-Powered Tool for Self-Regulation Support in Programming Education [4.171227316909729]
Large Language Model (LLM) tools have demonstrated their potential to deliver high-quality assistance. However, many of these tools operate independently from institutional Learning Management Systems. This isolation limits the ability to leverage learning materials and exercise context for generating tailored, context-aware feedback.
arXiv Detail & Related papers (2025-04-03T22:47:33Z)
Knowledge Tracing in Programming Education Integrating Students' Questions [0.0]
This paper introduces SQKT (Students' Question-based Knowledge Tracing), a knowledge tracing model that leverages students' questions and automatically extracted skill information. Experimental results demonstrate SQKT's superior performance in predicting student completion across various Python programming courses of differing difficulty levels. SQKT can be used to tailor educational content to individual learning needs and design adaptive learning systems in computer science education.
arXiv Detail & Related papers (2025-01-22T14:13:40Z)
Molly: Making Large Language Model Agents Solve Python Problem More Logically [11.317420065020173]
Molly agent parses the learners' questioning intent through a scenario-based interaction. At generation stage, the agent reflect on the generated responses to ensure that they not only align with factual content but also effectively answer the user's queries.
arXiv Detail & Related papers (2024-12-24T02:08:38Z)
On the Opportunities of Large Language Models for Programming Process Data [6.023152721616896]
We discuss opportunities of using large language models for analyzing programming process data. To complement our discussion, we outline a case study where we have leveraged LLMs for automatically summarizing the programming process.
arXiv Detail & Related papers (2024-11-01T07:20:01Z)
Large Language Models in Computer Science Education: A Systematic Literature Review [7.240148550817106]
Large language models (LLMs) are becoming increasingly better at a wide range of Natural Language Processing tasks (NLP) Recently, these models have extended their capabilities to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL)
arXiv Detail & Related papers (2024-10-21T17:49:50Z)
Knowledge Tagging with Large Language Model based Multi-Agent System [17.53518487546791]
This paper investigates the use of a multi-agent system to address the limitations of previous algorithms. We highlight the significant potential of an LLM-based multi-agent system in overcoming the challenges that previous methods have encountered.
arXiv Detail & Related papers (2024-09-12T21:39:01Z)
YuLan: An Open-source Large Language Model [179.59272970659677]
This paper presents the development of YuLan, a series of open-source large language models (LLMs) with $12$ billion parameters. The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts. We devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner.
arXiv Detail & Related papers (2024-06-28T11:52:53Z)
Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever [48.5585921817745]
Large Language Models (LLMs) are used to automate the knowledge tagging task. We show the strong performance of zero- and few-shot results over math questions knowledge tagging tasks. By proposing a reinforcement learning-based demonstration retriever, we successfully exploit the great potential of different-sized LLMs.
arXiv Detail & Related papers (2024-06-19T23:30:01Z)
CSEPrompts: A Benchmark of Introductory Computer Science Prompts [11.665831944836118]
Recent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs) Commercial applications have made this technology available to the general public, thus making it possible to use LLMs to produce high-quality texts for academic and professional purposes. Schools and universities are aware of the increasing use of AI-generated content by students and they have been researching the impact of this new technology and its potential misuse.
arXiv Detail & Related papers (2024-04-03T07:55:57Z)
Automate Knowledge Concept Tagging on Math Questions with LLMs [48.5585921817745]
Knowledge concept tagging for questions plays a crucial role in contemporary intelligent educational applications. Traditionally, these annotations have been conducted manually with help from pedagogical experts. In this paper, we explore the automating the tagging task using Large Language Models (LLMs)
arXiv Detail & Related papers (2024-03-26T00:09:38Z)
Generative Multi-Modal Knowledge Retrieval with Large Language Models [75.70313858231833]
We propose an innovative end-to-end generative framework for multi-modal knowledge retrieval. Our framework takes advantage of the fact that large language models (LLMs) can effectively serve as virtual knowledge bases. We demonstrate significant improvements ranging from 3.0% to 14.6% across all evaluation metrics when compared to strong baselines.
arXiv Detail & Related papers (2024-01-16T08:44:29Z)
DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models [76.79929883963275]
DIALIGHT is a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems. It features a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level. Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses.
arXiv Detail & Related papers (2024-01-04T11:27:48Z)
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages [86.90220551111096]
Training datasets for large language models (LLMs) are often not fully disclosed. We present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages.
arXiv Detail & Related papers (2023-09-17T23:49:10Z)
Leveraging Large Language Model and Story-Based Gamification in Intelligent Tutoring System to Scaffold Introductory Programming Courses: A Design-Based Research Study [6.773393436953262]
This study explores how large language models and. gamblers can scaffold coding learning and increase. Chinese students sense of belonging in introductory programming courses.
arXiv Detail & Related papers (2023-02-25T04:07:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.