QACP: An Annotated Question Answering Dataset for Assisting Chinese
Python Programming Learners
- URL: http://arxiv.org/abs/2402.07913v2
- Date: Fri, 23 Feb 2024 02:35:41 GMT
- Title: QACP: An Annotated Question Answering Dataset for Assisting Chinese
Python Programming Learners
- Authors: Rui Xiao, Lu Han, Xiaoying Zhou, Jiong Wang, Na Zong, Pengyu Zhang
- Abstract summary: This paper proposes a new Chinese question-and-answer dataset for Python learners.
It is designed to enhance the effectiveness and quality of online programming education.
- Score: 10.90557801193242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In online learning platforms, particularly in rapidly growing computer
programming courses, addressing the thousands of students' learning queries
requires considerable human cost. The creation of intelligent assistant large
language models (LLMs) tailored for programming education necessitates distinct
data support. However, in real application scenarios, the data resources for
training such LLMs are relatively scarce. Therefore, to address the data
scarcity in intelligent educational systems for programming, this paper
proposes a new Chinese question-and-answer dataset for Python learners. To
ensure the authenticity and reliability of the sources of the questions, we
collected questions from actual student questions and categorized them
according to various dimensions such as the type of questions and the type of
learners. This annotation principle is designed to enhance the effectiveness
and quality of online programming education, providing a solid data foundation
for developing the programming teaching assists (TA). Furthermore, we conducted
comprehensive evaluations of various LLMs proficient in processing and
generating Chinese content, highlighting the potential limitations of general
LLMs as intelligent teaching assistants in computer programming courses.
Related papers
- On the Opportunities of Large Language Models for Programming Process Data [6.023152721616896]
We discuss opportunities of using large language models for analyzing programming process data.
To complement our discussion, we outline a case study where we have leveraged LLMs for automatically summarizing the programming process.
arXiv Detail & Related papers (2024-11-01T07:20:01Z) - Large Language Models in Computer Science Education: A Systematic Literature Review [7.240148550817106]
Large language models (LLMs) are becoming increasingly better at a wide range of Natural Language Processing tasks (NLP)
Recently, these models have extended their capabilities to coding tasks, bridging the gap between natural languages (NL) and programming languages (PL)
arXiv Detail & Related papers (2024-10-21T17:49:50Z) - Knowledge Tagging with Large Language Model based Multi-Agent System [17.53518487546791]
This paper investigates the use of a multi-agent system to address the limitations of previous algorithms.
We highlight the significant potential of an LLM-based multi-agent system in overcoming the challenges that previous methods have encountered.
arXiv Detail & Related papers (2024-09-12T21:39:01Z) - YuLan: An Open-source Large Language Model [179.59272970659677]
This paper presents the development of YuLan, a series of open-source large language models (LLMs) with $12$ billion parameters.
The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts.
We devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner.
arXiv Detail & Related papers (2024-06-28T11:52:53Z) - Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever [48.5585921817745]
Large Language Models (LLMs) are used to automate the knowledge tagging task.
We show the strong performance of zero- and few-shot results over math questions knowledge tagging tasks.
By proposing a reinforcement learning-based demonstration retriever, we successfully exploit the great potential of different-sized LLMs.
arXiv Detail & Related papers (2024-06-19T23:30:01Z) - CSEPrompts: A Benchmark of Introductory Computer Science Prompts [11.665831944836118]
Recent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs)
Commercial applications have made this technology available to the general public, thus making it possible to use LLMs to produce high-quality texts for academic and professional purposes.
Schools and universities are aware of the increasing use of AI-generated content by students and they have been researching the impact of this new technology and its potential misuse.
arXiv Detail & Related papers (2024-04-03T07:55:57Z) - Automate Knowledge Concept Tagging on Math Questions with LLMs [48.5585921817745]
Knowledge concept tagging for questions plays a crucial role in contemporary intelligent educational applications.
Traditionally, these annotations have been conducted manually with help from pedagogical experts.
In this paper, we explore the automating the tagging task using Large Language Models (LLMs)
arXiv Detail & Related papers (2024-03-26T00:09:38Z) - Generative Multi-Modal Knowledge Retrieval with Large Language Models [75.70313858231833]
We propose an innovative end-to-end generative framework for multi-modal knowledge retrieval.
Our framework takes advantage of the fact that large language models (LLMs) can effectively serve as virtual knowledge bases.
We demonstrate significant improvements ranging from 3.0% to 14.6% across all evaluation metrics when compared to strong baselines.
arXiv Detail & Related papers (2024-01-16T08:44:29Z) - DIALIGHT: Lightweight Multilingual Development and Evaluation of
Task-Oriented Dialogue Systems with Large Language Models [76.79929883963275]
DIALIGHT is a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems.
It features a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level.
Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses.
arXiv Detail & Related papers (2024-01-04T11:27:48Z) - CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
Language Models in 167 Languages [86.90220551111096]
Training datasets for large language models (LLMs) are often not fully disclosed.
We present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages.
arXiv Detail & Related papers (2023-09-17T23:49:10Z) - Leveraging Large Language Model and Story-Based Gamification in
Intelligent Tutoring System to Scaffold Introductory Programming Courses: A
Design-Based Research Study [6.773393436953262]
This study explores how large language models and.
gamblers can scaffold coding learning and increase.
Chinese students sense of belonging in introductory programming courses.
arXiv Detail & Related papers (2023-02-25T04:07:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.