Related papers: CS1QA: A Dataset for Assisting Code-based Question Answering in an Introductory Programming Course

CS1QA: A Dataset for Assisting Code-based Question Answering in an Introductory Programming Course

URL: http://arxiv.org/abs/2210.14494v1
Date: Wed, 26 Oct 2022 05:40:34 GMT
Title: CS1QA: A Dataset for Assisting Code-based Question Answering in an Introductory Programming Course
Authors: Changyoon Lee, Yeon Seonwoo, Alice Oh
Abstract summary: CS1QA consists of 9,237 question-answer pairs gathered from chat logs in an introductory programming class using Python. Each question is accompanied with the student's code, and the portion of the code relevant to answering the question.
Score: 13.61096948994569
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We introduce CS1QA, a dataset for code-based question answering in the programming education domain. CS1QA consists of 9,237 question-answer pairs gathered from chat logs in an introductory programming class using Python, and 17,698 unannotated chat data with code. Each question is accompanied with the student's code, and the portion of the code relevant to answering the question. We carefully design the annotation process to construct CS1QA, and analyze the collected dataset in detail. The tasks for CS1QA are to predict the question type, the relevant code snippet given the question and the code and retrieving an answer from the annotated corpus. Results for the experiments on several baseline models are reported and thoroughly analyzed. The tasks for CS1QA challenge models to understand both the code and natural language. This unique dataset can be used as a benchmark for source code comprehension and question answering in the educational setting.

Related papers

PCoQA: Persian Conversational Question Answering Dataset [12.07607688189035]
The PCoQA dataset is a resource comprising information-seeking dialogs encompassing a total of 9,026 contextually-driven questions. PCoQA is designed to present novel challenges compared to previous question answering datasets. This paper not only presents the comprehensive PCoQA dataset but also reports the performance of various benchmark models.
arXiv Detail & Related papers (2023-12-07T15:29:34Z)
Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity) Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z)
Semantic Parsing for Conversational Question Answering over Knowledge Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof. We present two different semantic parsing approaches and highlight the challenges of the task. Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z)
CodeQueries: A Dataset of Semantic Queries over Code [7.0864879068510005]
We contribute a labeled dataset, called CodeQueries, of semantic queries over Python code. Compared to the existing datasets, in CodeQueries, the queries are about code semantics, the context is file level and the answers are code spans. We evaluate a large language model (GPT3.5-Turbo) in zero-shot and few-shot settings on a subset of CodeQueries.
arXiv Detail & Related papers (2022-09-17T17:09:30Z)
Write a Line: Tests with Answer Templates and String Completion Hints for Self-Learning in a CS1 Course [0.0]
This paper reports the results of using regular-expression-based questions with string completion hints in a CS1 course for 4 years with 497 students. The evaluation results show that Perl-compatible regular expressions provide good precision and recall (more than 99%) when used for questions requiring writing a single line of code.
arXiv Detail & Related papers (2022-04-19T17:53:35Z)
Solving Linear Algebra by Program Synthesis [1.0660480034605238]
We solve MIT's Linear Algebra 18.06 course and Columbia University's Computational Linear Algebra COMS3251 courses with perfect accuracy by interactive program synthesis. This surprisingly strong result is achieved by turning the course questions into programming tasks and then running the programs to produce the correct answers.
arXiv Detail & Related papers (2021-11-16T01:16:43Z)
CodeQA: A Question Answering Dataset for Source Code Comprehension [82.63394952538292]
Given a code snippet and a question, a textual answer is required to be generated. CodeQA contains a Java dataset with 119,778 question-answer pairs and a Python dataset with 70,085 question-answer pairs.
arXiv Detail & Related papers (2021-09-17T06:06:38Z)
Few-Shot Complex Knowledge Base Question Answering via Meta Reinforcement Learning [55.08037694027792]
Complex question-answering (CQA) involves answering complex natural-language questions on a knowledge base (KB) The conventional neural program induction (NPI) approach exhibits uneven performance when the questions have different types. This paper proposes a meta-reinforcement learning approach to program induction in CQA to tackle the potential distributional bias in questions.
arXiv Detail & Related papers (2020-10-29T18:34:55Z)
Retrieve, Program, Repeat: Complex Knowledge Base Question Answering via Alternate Meta-learning [56.771557756836906]
We present a novel method that automatically learns a retrieval model alternately with the programmer from weak supervision. Our system leads to state-of-the-art performance on a large-scale task for complex question answering over knowledge bases.
arXiv Detail & Related papers (2020-10-29T18:28:16Z)
Understanding Unnatural Questions Improves Reasoning over Text [54.235828149899625]
Complex question answering (CQA) over raw text is a challenging task. Learning an effective CQA model requires large amounts of human-annotated data. We address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions.
arXiv Detail & Related papers (2020-10-19T10:22:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.