Improving Peer Assessment with Graph Convolutional Networks
- URL: http://arxiv.org/abs/2111.04466v1
- Date: Thu, 4 Nov 2021 03:43:09 GMT
- Title: Improving Peer Assessment with Graph Convolutional Networks
- Authors: Alireza A. Namanloo, Julie Thorpe, Amirali Salehi-Abari
- Abstract summary: Peer assessment might not be as accurate as expert evaluations, thus rendering these systems unreliable.
We first model peer assessment as multi-relational weighted networks that can express a variety of peer assessment setups.
We introduce a graph convolutional network which can learn assessment patterns and user behaviors to more accurately predict expert evaluations.
- Score: 2.105564340986074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Peer assessment systems are emerging in many social and multi-agent settings,
such as peer grading in large (online) classes, peer review in conferences,
peer art evaluation, etc. However, peer assessments might not be as accurate as
expert evaluations, thus rendering these systems unreliable. The reliability of
peer assessment systems is influenced by various factors such as assessment
ability of peers, their strategic assessment behaviors, and the peer assessment
setup (e.g., peer evaluating group work or individual work of others). In this
work, we first model peer assessment as multi-relational weighted networks that
can express a variety of peer assessment setups, plus capture conflicts of
interest and strategic behaviors. Leveraging our peer assessment network model,
we introduce a graph convolutional network which can learn assessment patterns
and user behaviors to more accurately predict expert evaluations. Our extensive
experiments on real and synthetic datasets demonstrate the efficacy of our
proposed approach, which outperforms existing peer assessment methods.
Related papers
- Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - Evaluation in Neural Style Transfer: A Review [0.7614628596146599]
We provide an in-depth analysis of existing evaluation techniques, identify the inconsistencies and limitations of current evaluation methods, and give recommendations for standardized evaluation practices.
We believe that the development of a robust evaluation framework will not only enable more meaningful and fairer comparisons but will also enhance the comprehension and interpretation of research findings in the field.
arXiv Detail & Related papers (2024-01-30T15:45:30Z) - Evaluating Agents using Social Choice Theory [21.26784305333596]
We argue that many general evaluation problems can be viewed through the lens of voting theory.
Each task is interpreted as a separate voter, which requires only ordinal rankings or pairwise comparisons of agents to produce an overall evaluation.
These evaluations are interpretable and flexible, while avoiding many of the problems currently facing cross-task evaluation.
arXiv Detail & Related papers (2023-12-05T20:40:37Z) - Strategic Evaluation: Subjects, Evaluators, and Society [1.1606619391009658]
We argue that the design of an evaluation itself can be understood as furthering goals held by the evaluator.
We put forward a model that represents the process of evaluation using three interacting agents.
Treating evaluators as themselves strategic allows us to re-cast the scrutiny directed at decision subjects.
arXiv Detail & Related papers (2023-10-05T16:33:08Z) - Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - Towards Better Understanding Attribution Methods [77.1487219861185]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We also propose a post-processing smoothing step that significantly improves the performance of some attribution methods.
arXiv Detail & Related papers (2022-05-20T20:50:17Z) - The Price of Strategyproofing Peer Assessment [30.51994705981846]
Strategic behavior is a fundamental problem in a variety of real-world applications that require some form of peer assessment.
Since an individual's own work is in competition with the submissions they are evaluating, they may provide dishonest evaluations to increase the relative standing of their own submission.
This issue is typically addressed by partitioning the individuals and assigning them to evaluate the work of only those from different subsets.
arXiv Detail & Related papers (2022-01-25T21:16:33Z) - Ranking Scientific Papers Using Preference Learning [48.78161994501516]
We cast it as a paper ranking problem based on peer review texts and reviewer scores.
We introduce a novel, multi-faceted generic evaluation framework for making final decisions based on peer reviews.
arXiv Detail & Related papers (2021-09-02T19:41:47Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z) - Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment [61.24399136715106]
We consider the issue of strategic behaviour in various peer-assessment tasks, including peer grading of exams or homeworks and peer review in hiring or promotions.
Our focus is on designing methods for detection of such manipulations.
Specifically, we consider a setting in which agents evaluate a subset of their peers and output rankings that are later aggregated to form a final ordering.
arXiv Detail & Related papers (2020-10-08T15:08:40Z) - Wisdom of collaborators: a peer-review approach to performance appraisal [0.0]
We propose a novel metric, the Peer Rank Score (PRS), that evaluates individual reputations and the non-quantifiable individual impact.
PRS is based on pairwise comparisons of employees.
We show high robustness of the algorithm on simulations and empirically validate it for a genetic testing company on more than one thousand employees.
arXiv Detail & Related papers (2019-12-30T09:23:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.