Related papers: Online Peer-Assessment Datasets

Online Peer-Assessment Datasets

URL: http://arxiv.org/abs/1912.13050v1
Date: Mon, 30 Dec 2019 18:48:55 GMT
Title: Online Peer-Assessment Datasets
Authors: Michael Mogessie Ashenafi
Abstract summary: Peer-assessment experiments were conducted among first and second year students at the University of Trento. The experiments spanned an entire semester and were conducted in five computer science courses between 2013 and 2016. The datasets are reported as parsable data structures that, with intermediate processing, can be moulded into NLP or ML-ready datasets.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Peer-assessment experiments were conducted among first and second year students at the University of Trento. The experiments spanned an entire semester and were conducted in five computer science courses between 2013 and 2016. Peer-assessment tasks included question and answer submission as well as answer evaluation tasks. The peer-assessment datasets are complimented by the final scores of participating students for each course. Teachers were involved in filtering out questions submitted by students on a weekly basis. Selected questions were then used in subsequent peer-assessment tasks. However, expert ratings are not included in the dataset. A major reason for this decision was that peer-assessment tasks were designed with minimal teacher supervision in mind. Arguments in favour of this approach are presented. The datasets are designed in a manner that would allow their utilization in a variety of experiments. They are reported as parsable data structures that, with intermediate processing, can be moulded into NLP or ML-ready datasets. Potential applications of interest include performance prediction and text similarity tasks.

Related papers

LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews [74.87393214734114]
This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories. Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting. instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points.
arXiv Detail & Related papers (2025-04-15T10:07:33Z)
PeerQA: A Scientific Question Answering Dataset from Peer Reviews [51.95579001315713]
We present PeerQA, a real-world, scientific, document-level Question Answering dataset. The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP. We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks.
arXiv Detail & Related papers (2025-02-19T12:24:46Z)
"Did my figure do justice to the answer?" : Towards Multimodal Short Answer Grading with Feedback (MMSAF) [36.74896284581596]
We propose the Multimodal Short Answer Grading with Feedback problem along with a dataset of 2197 data points. Our evaluations on existing Large Language Models (LLMs) over this dataset achieved an overall accuracy of 55% on the Level of Correctness labels. As per human experts, Pixtral was more aligned towards human judgement and values for biology and ChatGPT for physics and chemistry.
arXiv Detail & Related papers (2024-12-27T17:33:39Z)
Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z)
Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset [0.0]
This paper investigates the potential for the newest version of Large Language Models (LLMs) to be used in short answer questions for formative assessments. It introduces a novel dataset of short answer reading comprehension questions, drawn from a set of reading assessments conducted with over 150 students in Ghana. The paper empirically evaluates how well various configurations of generative LLMs grade student short answer responses compared to expert human raters.
arXiv Detail & Related papers (2023-10-26T17:05:40Z)
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond [93.96982273042296]
Vision-language (VL) understanding tasks evaluate models' comprehension of complex visual scenes through multiple-choice questions. We have identified two dataset biases that models can exploit as shortcuts to resolve various VL tasks correctly without proper understanding. We propose Adversarial Data Synthesis (ADS) to generate synthetic training and debiased evaluation data. We then introduce Intra-sample Counterfactual Training (ICT) to assist models in utilizing the synthesized training data, particularly the counterfactual data, via focusing on intra-sample differentiation.
arXiv Detail & Related papers (2023-10-23T08:09:42Z)
Amortised Design Optimization for Item Response Theory [5.076871870091048]
In education, Item Response Theory (IRT) is used to infer student abilities and characteristics of test items from student responses. In response, we propose incorporating amortised experimental design into IRT. The computational cost is shifted to a precomputing phase by training a Deep Reinforcement Learning (DRL) agent with synthetic data.
arXiv Detail & Related papers (2023-07-19T10:42:56Z)
Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability. We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side. During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z)
A Predictive Model for Student Performance in Classrooms Using Student Interactions With an eTextbook [0.0]
This paper proposes a new model for predicting student performance based on an analysis of how students interact with an interactive online eTextbook. To build the proposed model, we evaluated the most popular classification and regression algorithms on data from a data structures and algorithms course.
arXiv Detail & Related papers (2022-02-16T11:59:53Z)
Extracting candidate factors affecting long-term trends of student abilities across subjects [0.0]
Long-term student achievement data provide useful information to formulate the research question of what types of student skills would impact future trends across subjects. We propose a novel approach to extract candidate factors affecting long-term trends across subjects from long-term data.
arXiv Detail & Related papers (2021-03-11T04:13:58Z)
Peer-inspired Student Performance Prediction in Interactive Online Question Pools with Graph Neural Network [56.62345811216183]
We propose a novel approach using Graph Neural Networks (GNNs) to achieve better student performance prediction in interactive online question pools. Specifically, we model the relationship between students and questions using student interactions to construct the student-interaction-question network. We evaluate the effectiveness of our approach on a real-world dataset consisting of 104,113 mouse trajectories generated in the problem-solving process of over 4000 students on 1631 questions.
arXiv Detail & Related papers (2020-08-04T14:55:32Z)
How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks. Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)
Improving Multi-Turn Response Selection Models with Complementary Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals. We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
R2DE: a NLP approach to estimating IRT parameters of newly generated questions [3.364554138758565]
R2DE is a model capable of assessing newly generated multiple-choice questions by looking at the text of the question. In particular, it can estimate the difficulty and the discrimination of each question.
arXiv Detail & Related papers (2020-01-21T14:31:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.