Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment
- URL: http://arxiv.org/abs/2010.04041v1
- Date: Thu, 8 Oct 2020 15:08:40 GMT
- Title: Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment
- Authors: Ivan Stelmakh, Nihar B. Shah, Aarti Singh
- Abstract summary: We consider the issue of strategic behaviour in various peer-assessment tasks, including peer grading of exams or homeworks and peer review in hiring or promotions.
Our focus is on designing methods for detection of such manipulations.
Specifically, we consider a setting in which agents evaluate a subset of their peers and output rankings that are later aggregated to form a final ordering.
- Score: 61.24399136715106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the issue of strategic behaviour in various peer-assessment
tasks, including peer grading of exams or homeworks and peer review in hiring
or promotions. When a peer-assessment task is competitive (e.g., when students
are graded on a curve), agents may be incentivized to misreport evaluations in
order to improve their own final standing. Our focus is on designing methods
for detection of such manipulations. Specifically, we consider a setting in
which agents evaluate a subset of their peers and output rankings that are
later aggregated to form a final ordering. In this paper, we investigate a
statistical framework for this problem and design a principled test for
detecting strategic behaviour. We prove that our test has strong false alarm
guarantees and evaluate its detection ability in practical settings. For this,
we design and execute an experiment that elicits strategic behaviour from
subjects and release a dataset of patterns of strategic behaviour that may be
of independent interest. We then use the collected data to conduct a series of
real and semi-synthetic evaluations that demonstrate a strong detection power
of our test.
Related papers
- Strategic Evaluation: Subjects, Evaluators, and Society [1.1606619391009658]
We argue that the design of an evaluation itself can be understood as furthering goals held by the evaluator.
We put forward a model that represents the process of evaluation using three interacting agents.
Treating evaluators as themselves strategic allows us to re-cast the scrutiny directed at decision subjects.
arXiv Detail & Related papers (2023-10-05T16:33:08Z) - From Adversarial Arms Race to Model-centric Evaluation: Motivating a
Unified Automatic Robustness Evaluation Framework [91.94389491920309]
Textual adversarial attacks can discover models' weaknesses by adding semantic-preserved but misleading perturbations to the inputs.
The existing practice of robustness evaluation may exhibit issues of incomprehensive evaluation, impractical evaluation protocol, and invalid adversarial samples.
We set up a unified automatic robustness evaluation framework, shifting towards model-centric evaluation to exploit the advantages of adversarial attacks.
arXiv Detail & Related papers (2023-05-29T14:55:20Z) - A Dataset on Malicious Paper Bidding in Peer Review [84.68308372858755]
Malicious reviewers strategically bid in order to unethically manipulate the paper assignment.
A critical impediment towards creating and evaluating methods to mitigate this issue is the lack of publicly-available data on malicious paper bidding.
We release a novel dataset, collected from a mock conference activity where participants were instructed to bid either honestly or maliciously.
arXiv Detail & Related papers (2022-06-24T20:23:33Z) - A Unified Evaluation of Textual Backdoor Learning: Frameworks and
Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning.
We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z) - The Price of Strategyproofing Peer Assessment [30.51994705981846]
Strategic behavior is a fundamental problem in a variety of real-world applications that require some form of peer assessment.
Since an individual's own work is in competition with the submissions they are evaluating, they may provide dishonest evaluations to increase the relative standing of their own submission.
This issue is typically addressed by partitioning the individuals and assigning them to evaluate the work of only those from different subsets.
arXiv Detail & Related papers (2022-01-25T21:16:33Z) - Improving Peer Assessment with Graph Convolutional Networks [2.105564340986074]
Peer assessment might not be as accurate as expert evaluations, thus rendering these systems unreliable.
We first model peer assessment as multi-relational weighted networks that can express a variety of peer assessment setups.
We introduce a graph convolutional network which can learn assessment patterns and user behaviors to more accurately predict expert evaluations.
arXiv Detail & Related papers (2021-11-04T03:43:09Z) - Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting
Pot [71.28884625011987]
Melting Pot is a MARL evaluation suite that uses reinforcement learning to reduce the human labor required to create novel test scenarios.
We have created over 80 unique test scenarios covering a broad range of research topics.
We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone.
arXiv Detail & Related papers (2021-07-14T17:22:14Z) - Evaluating the Robustness of Collaborative Agents [25.578427956101603]
We take inspiration from the practice of emphunit testing in software engineering.
We apply this methodology to build a suite of unit tests for the Overcooked-AI environment.
arXiv Detail & Related papers (2021-01-14T09:02:45Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.