Wisdom of collaborators: a peer-review approach to performance appraisal
- URL: http://arxiv.org/abs/1912.12861v1
- Date: Mon, 30 Dec 2019 09:23:51 GMT
- Title: Wisdom of collaborators: a peer-review approach to performance appraisal
- Authors: Sofia Dokuka, Ivan Zaikin, Kate Furman, Maksim Tsvetovat and Alex
Furman
- Abstract summary: We propose a novel metric, the Peer Rank Score (PRS), that evaluates individual reputations and the non-quantifiable individual impact.
PRS is based on pairwise comparisons of employees.
We show high robustness of the algorithm on simulations and empirically validate it for a genetic testing company on more than one thousand employees.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Individual performance and reputation within a company are major factors that
influence wage distribution, promotion and firing. Due to the complexity and
collaborative nature of contemporary business processes, the evaluation of
individual impact in the majority of organizations is an ambiguous and
non-trivial task. Existing performance appraisal approaches are often affected
by individuals biased judgements, and organizations are dissatisfied with the
results of evaluations. We assert that employees can provide accurate
measurement of their peer performance in a complex collaborative environment.
We propose a novel metric, the Peer Rank Score (PRS), that evaluates individual
reputations and the non-quantifiable individual impact. PRS is based on
pairwise comparisons of employees. We show high robustness of the algorithm on
simulations and empirically validate it for a genetic testing company on more
than one thousand employees using peer reviews over the course of three years.
Related papers
- SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation [75.56845750400116]
Disaggregated evaluation -- estimation of performance of a machine learning model on different subpopulations -- is a core task when assessing performance and group-fairness of AI systems.
We develop SureMap that has high estimation accuracy for both multi-task and single-task disaggregated evaluations of blackbox models.
Our method combines maximum a posteriori (MAP) estimation using a well-chosen prior together with cross-validation-free tuning via Stein's unbiased risk estimate (SURE)
arXiv Detail & Related papers (2024-11-14T17:53:35Z) - Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse [9.542503507653494]
Chain-of-thought (CoT) has become a widely used strategy for working with large language and multimodal models.
We identify characteristics of tasks where CoT reduces performance by drawing inspiration from cognitive psychology.
We find that a diverse collection of state-of-the-art models exhibit significant drop-offs in performance when using inference-time reasoning.
arXiv Detail & Related papers (2024-10-27T18:30:41Z) - (De)Noise: Moderating the Inconsistency Between Human Decision-Makers [15.291993233528526]
We study whether algorithmic decision aids can be used to moderate the degree of inconsistency in human decision-making in the context of real estate appraisal.
We find that both (i) asking respondents to review their estimates in a series of algorithmically chosen pairwise comparisons and (ii) providing respondents with traditional machine advice are effective strategies for influencing human responses.
arXiv Detail & Related papers (2024-07-15T20:24:36Z) - Mitigating Cognitive Biases in Multi-Criteria Crowd Assessment [22.540544209683592]
We focus on cognitive biases associated with a multi-criteria assessment in crowdsourcing.
Crowdworkers who rate targets with multiple different criteria simultaneously may provide biased responses due to prominence of some criteria or global impressions of the evaluation targets.
We propose two specific model structures for Bayesian opinion aggregation models that consider inter-criteria relations.
arXiv Detail & Related papers (2024-07-10T16:00:23Z) - 360$^\circ$REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System [71.96888731208838]
We argue that a comprehensive evaluation and accumulating experience from evaluation feedback is an effective approach to improving system performance.
We propose Reusable Experience Accumulation with 360$circ$ Assessment (360$circ$REA), a hierarchical multi-agent framework inspired by corporate organizational practices.
arXiv Detail & Related papers (2024-04-08T14:43:13Z) - Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators [48.54465599914978]
Large Language Models (LLMs) have demonstrated promising capabilities in assessing the quality of generated natural language.
LLMs still exhibit biases in evaluation and often struggle to generate coherent evaluations that align with human assessments.
We introduce Pairwise-preference Search (PairS), an uncertainty-guided search method that employs LLMs to conduct pairwise comparisons and efficiently ranks candidate texts.
arXiv Detail & Related papers (2024-03-25T17:11:28Z) - Collaborative Evaluation: Exploring the Synergy of Large Language Models
and Humans for Open-ended Generation Evaluation [71.76872586182981]
Large language models (LLMs) have emerged as a scalable and cost-effective alternative to human evaluations.
We propose a Collaborative Evaluation pipeline CoEval, involving the design of a checklist of task-specific criteria and the detailed evaluation of texts.
arXiv Detail & Related papers (2023-10-30T17:04:35Z) - Measuring the Effect of Influential Messages on Varying Personas [67.1149173905004]
We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona might have upon seeing a news message.
The proposed task not only introduces personalization in the modeling but also predicts the sentiment polarity and intensity of each response.
This enables more accurate and comprehensive inference on the mental state of the persona.
arXiv Detail & Related papers (2023-05-25T21:01:00Z) - Improving Peer Assessment with Graph Convolutional Networks [2.105564340986074]
Peer assessment might not be as accurate as expert evaluations, thus rendering these systems unreliable.
We first model peer assessment as multi-relational weighted networks that can express a variety of peer assessment setups.
We introduce a graph convolutional network which can learn assessment patterns and user behaviors to more accurately predict expert evaluations.
arXiv Detail & Related papers (2021-11-04T03:43:09Z) - Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment [61.24399136715106]
We consider the issue of strategic behaviour in various peer-assessment tasks, including peer grading of exams or homeworks and peer review in hiring or promotions.
Our focus is on designing methods for detection of such manipulations.
Specifically, we consider a setting in which agents evaluate a subset of their peers and output rankings that are later aggregated to form a final ordering.
arXiv Detail & Related papers (2020-10-08T15:08:40Z) - The cost of coordination can exceed the benefit of collaboration in
performing complex tasks [0.0]
dyads gradually improve in performance but do not experience a collective benefit compared to individuals in most situations.
Having an additional expert in the dyad who is adequately trained improves accuracy.
Findings highlight that the extent of training received by an individual, the complexity of the task at hand, and the desired performance indicator are all critical factors that need to be accounted for when weighing up the benefits of collective decision-making.
arXiv Detail & Related papers (2020-09-23T10:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.