Related papers: Linking open-source code commits and MOOC grades to evaluate massive online open peer review

Linking open-source code commits and MOOC grades to evaluate massive online open peer review

URL: http://arxiv.org/abs/2104.12555v1
Date: Thu, 15 Apr 2021 18:27:01 GMT
Title: Linking open-source code commits and MOOC grades to evaluate massive online open peer review
Authors: Siruo Wang, Leah R. Jager, Kai Kammers, Aboozar Hadavand, Jeffrey T. Leek
Abstract summary: We link data from public code repositories on GitHub and course grades for a large massive-online open course to study the dynamics of massive scale peer review. We find three distinct repeated peerreview submissions and use these to study how grades change in response to changes in code submissions. Our exploration also leads to an important observation that massive scale peer-review scores are highly variable, increase, on average, with repeated submissions, and changes in scores are not closely tied to the code changes that form the basis for the re-s.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Massive Open Online Courses (MOOCs) have been used by students as a low-cost and low-touch educational credential in a variety of fields. Understanding the grading mechanisms behind these course assignments is important for evaluating MOOC credentials. A common approach to grading free-response assignments is massive scale peer-review, especially used for assignments that are not easy to grade programmatically. It is difficult to assess these approaches since the responses typically require human evaluation. Here we link data from public code repositories on GitHub and course grades for a large massive-online open course to study the dynamics of massive scale peer review. This has important implications for understanding the dynamics of difficult to grade assignments. Since the research was not hypothesis-driven, we described the results in an exploratory framework. We find three distinct clusters of repeated peer-review submissions and use these clusters to study how grades change in response to changes in code submissions. Our exploration also leads to an important observation that massive scale peer-review scores are highly variable, increase, on average, with repeated submissions, and changes in scores are not closely tied to the code changes that form the basis for the re-submissions.

Related papers

GHOST: Gaussian Hypothesis Open-Set Technique [10.426399605773083]
Evaluations of large-scale recognition methods typically focus on overall performance. addressing fairness in Open-Set Recognition (OSR), we demonstrate that per-class performance can vary dramatically. We apply Z-score normalization to logits to mitigate the impact of feature magnitudes that deviate from the model's expectations.
arXiv Detail & Related papers (2025-02-05T16:56:14Z)
AgentReview: Exploring Peer Review Dynamics with LLM Agents [13.826819101545926]
We introduce AgentReview, the first large language model (LLM) based peer review simulation framework. Our study reveals significant insights, including a notable 37.1% variation in paper decisions due to reviewers' biases.
arXiv Detail & Related papers (2024-06-18T15:22:12Z)
Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions [62.0123588983514]
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields. We reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers. We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources.
arXiv Detail & Related papers (2024-06-09T08:24:17Z)
Investigating Fairness Disparities in Peer Review: A Language Model Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs) We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date. We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)
Integrating Rankings into Quantized Scores in Peer Review [61.27794774537103]
In peer review, reviewers are usually asked to provide scores for the papers. To mitigate this issue, conferences have started to ask reviewers to additionally provide a ranking of the papers they have reviewed. There are no standard procedure for using this ranking information and Area Chairs may use it in different ways. We take a principled approach to integrate the ranking information into the scores.
arXiv Detail & Related papers (2022-04-05T19:39:13Z)
Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features. We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors. Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z)
Polarity in the Classroom: A Case Study Leveraging Peer Sentiment Toward Scalable Assessment [4.588028371034406]
Accurately grading open-ended assignments in large or massive open online courses (MOOCs) is non-trivial. In this work, we detail the process by which we create our domain-dependent lexicon and aspect-informed review form. We end by analyzing validity and discussing conclusions from our corpus of over 6800 peer reviews from nine courses.
arXiv Detail & Related papers (2021-08-02T15:45:11Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
The Influence of Domain-Based Preprocessing on Subject-Specific Clustering [55.41644538483948]
The sudden change of moving the majority of teaching online at Universities has caused an increased amount of workload for academics. One way to deal with this problem is to cluster these questions depending on their topic. In this paper, we explore the realms of tagging data sets, focusing on identifying code excerpts and providing empirical results.
arXiv Detail & Related papers (2020-11-16T17:47:19Z)
Key Phrase Classification in Complex Assignments [5.067828201066184]
We show that the task of classification of key phrases is ambiguous at a human level producing Cohen's kappa of 0.77 on a new data set. Both pretrained language models and simple TFIDF SVM classifiers produce similar results with a former producing average of 0.6 F1 higher than the latter.
arXiv Detail & Related papers (2020-03-16T04:25:37Z)
Systematic Review of Approaches to Improve Peer Assessment at Scale [5.067828201066184]
This review focuses on three facets of Peer Assessment (PA) namely Auto grading and Peer Assessment Tools (we shall look only on how peer reviews/auto-grading is carried), strategies to handle Rogue Reviews, Peer Review Improvement using Natural Language Processing.
arXiv Detail & Related papers (2020-01-27T15:59:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.