CS1QA: A Dataset for Assisting Code-based Question Answering in an
Introductory Programming Course
- URL: http://arxiv.org/abs/2210.14494v1
- Date: Wed, 26 Oct 2022 05:40:34 GMT
- Title: CS1QA: A Dataset for Assisting Code-based Question Answering in an
Introductory Programming Course
- Authors: Changyoon Lee, Yeon Seonwoo, Alice Oh
- Abstract summary: CS1QA consists of 9,237 question-answer pairs gathered from chat logs in an introductory programming class using Python.
Each question is accompanied with the student's code, and the portion of the code relevant to answering the question.
- Score: 13.61096948994569
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We introduce CS1QA, a dataset for code-based question answering in the
programming education domain. CS1QA consists of 9,237 question-answer pairs
gathered from chat logs in an introductory programming class using Python, and
17,698 unannotated chat data with code. Each question is accompanied with the
student's code, and the portion of the code relevant to answering the question.
We carefully design the annotation process to construct CS1QA, and analyze the
collected dataset in detail. The tasks for CS1QA are to predict the question
type, the relevant code snippet given the question and the code and retrieving
an answer from the annotated corpus. Results for the experiments on several
baseline models are reported and thoroughly analyzed. The tasks for CS1QA
challenge models to understand both the code and natural language. This unique
dataset can be used as a benchmark for source code comprehension and question
answering in the educational setting.
Related papers
- PCoQA: Persian Conversational Question Answering Dataset [12.07607688189035]
The PCoQA dataset is a resource comprising information-seeking dialogs encompassing a total of 9,026 contextually-driven questions.
PCoQA is designed to present novel challenges compared to previous question answering datasets.
This paper not only presents the comprehensive PCoQA dataset but also reports the performance of various benchmark models.
arXiv Detail & Related papers (2023-12-07T15:29:34Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - Semantic Parsing for Conversational Question Answering over Knowledge
Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof.
We present two different semantic parsing approaches and highlight the challenges of the task.
Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z) - CodeQueries: A Dataset of Semantic Queries over Code [7.0864879068510005]
We contribute a labeled dataset, called CodeQueries, of semantic queries over Python code.
Compared to the existing datasets, in CodeQueries, the queries are about code semantics, the context is file level and the answers are code spans.
We evaluate a large language model (GPT3.5-Turbo) in zero-shot and few-shot settings on a subset of CodeQueries.
arXiv Detail & Related papers (2022-09-17T17:09:30Z) - Write a Line: Tests with Answer Templates and String Completion Hints
for Self-Learning in a CS1 Course [0.0]
This paper reports the results of using regular-expression-based questions with string completion hints in a CS1 course for 4 years with 497 students.
The evaluation results show that Perl-compatible regular expressions provide good precision and recall (more than 99%) when used for questions requiring writing a single line of code.
arXiv Detail & Related papers (2022-04-19T17:53:35Z) - Solving Linear Algebra by Program Synthesis [1.0660480034605238]
We solve MIT's Linear Algebra 18.06 course and Columbia University's Computational Linear Algebra COMS3251 courses with perfect accuracy by interactive program synthesis.
This surprisingly strong result is achieved by turning the course questions into programming tasks and then running the programs to produce the correct answers.
arXiv Detail & Related papers (2021-11-16T01:16:43Z) - CodeQA: A Question Answering Dataset for Source Code Comprehension [82.63394952538292]
Given a code snippet and a question, a textual answer is required to be generated.
CodeQA contains a Java dataset with 119,778 question-answer pairs and a Python dataset with 70,085 question-answer pairs.
arXiv Detail & Related papers (2021-09-17T06:06:38Z) - Few-Shot Complex Knowledge Base Question Answering via Meta
Reinforcement Learning [55.08037694027792]
Complex question-answering (CQA) involves answering complex natural-language questions on a knowledge base (KB)
The conventional neural program induction (NPI) approach exhibits uneven performance when the questions have different types.
This paper proposes a meta-reinforcement learning approach to program induction in CQA to tackle the potential distributional bias in questions.
arXiv Detail & Related papers (2020-10-29T18:34:55Z) - Retrieve, Program, Repeat: Complex Knowledge Base Question Answering via
Alternate Meta-learning [56.771557756836906]
We present a novel method that automatically learns a retrieval model alternately with the programmer from weak supervision.
Our system leads to state-of-the-art performance on a large-scale task for complex question answering over knowledge bases.
arXiv Detail & Related papers (2020-10-29T18:28:16Z) - Understanding Unnatural Questions Improves Reasoning over Text [54.235828149899625]
Complex question answering (CQA) over raw text is a challenging task.
Learning an effective CQA model requires large amounts of human-annotated data.
We address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions.
arXiv Detail & Related papers (2020-10-19T10:22:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.