Wisdom of collaborators: a peer-review approach to performance appraisal
- URL: http://arxiv.org/abs/1912.12861v1
- Date: Mon, 30 Dec 2019 09:23:51 GMT
- Title: Wisdom of collaborators: a peer-review approach to performance appraisal
- Authors: Sofia Dokuka, Ivan Zaikin, Kate Furman, Maksim Tsvetovat and Alex
Furman
- Abstract summary: We propose a novel metric, the Peer Rank Score (PRS), that evaluates individual reputations and the non-quantifiable individual impact.
PRS is based on pairwise comparisons of employees.
We show high robustness of the algorithm on simulations and empirically validate it for a genetic testing company on more than one thousand employees.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Individual performance and reputation within a company are major factors that
influence wage distribution, promotion and firing. Due to the complexity and
collaborative nature of contemporary business processes, the evaluation of
individual impact in the majority of organizations is an ambiguous and
non-trivial task. Existing performance appraisal approaches are often affected
by individuals biased judgements, and organizations are dissatisfied with the
results of evaluations. We assert that employees can provide accurate
measurement of their peer performance in a complex collaborative environment.
We propose a novel metric, the Peer Rank Score (PRS), that evaluates individual
reputations and the non-quantifiable individual impact. PRS is based on
pairwise comparisons of employees. We show high robustness of the algorithm on
simulations and empirically validate it for a genetic testing company on more
than one thousand employees using peer reviews over the course of three years.
Related papers
- Employee Turnover Prediction: A Cross-component Attention Transformer with Consideration of Competitor Influence and Contagious Effect [12.879229546467117]
We propose a novel deep learning approach based on job embeddedness theory to predict the turnovers of individual employees across different firms.
Our developed method demonstrates superior performance over several state-of-the-art benchmark methods.
arXiv Detail & Related papers (2025-01-31T22:25:39Z) - HREF: Human Response-Guided Evaluation of Instruction Following in Language Models [61.273153125847166]
We develop a new evaluation benchmark, Human Response-Guided Evaluation of Instruction Following (HREF)
In addition to providing reliable evaluation, HREF emphasizes individual task performance and is free from contamination.
We study the impact of key design choices in HREF, including the size of the evaluation set, the judge model, the baseline model, and the prompt template.
arXiv Detail & Related papers (2024-12-20T03:26:47Z) - SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation [75.56845750400116]
Disaggregated evaluation -- estimation of performance of a machine learning model on different subpopulations -- is a core task when assessing performance and group-fairness of AI systems.
We develop SureMap that has high estimation accuracy for both multi-task and single-task disaggregated evaluations of blackbox models.
Our method combines maximum a posteriori (MAP) estimation using a well-chosen prior together with cross-validation-free tuning via Stein's unbiased risk estimate (SURE)
arXiv Detail & Related papers (2024-11-14T17:53:35Z) - Direct Judgement Preference Optimization [66.83088028268318]
We train large language models (LLMs) as generative judges to evaluate and critique other models' outputs.
We employ three approaches to collect the preference pairs for different use cases, each aimed at improving our generative judge from a different perspective.
Our model robustly counters inherent biases such as position and length bias, flexibly adapts to any evaluation protocol specified by practitioners, and provides helpful language feedback for improving downstream generator models.
arXiv Detail & Related papers (2024-09-23T02:08:20Z) - Mitigating Cognitive Biases in Multi-Criteria Crowd Assessment [22.540544209683592]
We focus on cognitive biases associated with a multi-criteria assessment in crowdsourcing.
Crowdworkers who rate targets with multiple different criteria simultaneously may provide biased responses due to prominence of some criteria or global impressions of the evaluation targets.
We propose two specific model structures for Bayesian opinion aggregation models that consider inter-criteria relations.
arXiv Detail & Related papers (2024-07-10T16:00:23Z) - 360$^\circ$REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System [71.96888731208838]
We argue that a comprehensive evaluation and accumulating experience from evaluation feedback is an effective approach to improving system performance.
We propose Reusable Experience Accumulation with 360$circ$ Assessment (360$circ$REA), a hierarchical multi-agent framework inspired by corporate organizational practices.
arXiv Detail & Related papers (2024-04-08T14:43:13Z) - Measuring the Effect of Influential Messages on Varying Personas [67.1149173905004]
We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona might have upon seeing a news message.
The proposed task not only introduces personalization in the modeling but also predicts the sentiment polarity and intensity of each response.
This enables more accurate and comprehensive inference on the mental state of the persona.
arXiv Detail & Related papers (2023-05-25T21:01:00Z) - Improving Peer Assessment with Graph Convolutional Networks [2.105564340986074]
Peer assessment might not be as accurate as expert evaluations, thus rendering these systems unreliable.
We first model peer assessment as multi-relational weighted networks that can express a variety of peer assessment setups.
We introduce a graph convolutional network which can learn assessment patterns and user behaviors to more accurately predict expert evaluations.
arXiv Detail & Related papers (2021-11-04T03:43:09Z) - Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment [61.24399136715106]
We consider the issue of strategic behaviour in various peer-assessment tasks, including peer grading of exams or homeworks and peer review in hiring or promotions.
Our focus is on designing methods for detection of such manipulations.
Specifically, we consider a setting in which agents evaluate a subset of their peers and output rankings that are later aggregated to form a final ordering.
arXiv Detail & Related papers (2020-10-08T15:08:40Z) - The cost of coordination can exceed the benefit of collaboration in
performing complex tasks [0.0]
dyads gradually improve in performance but do not experience a collective benefit compared to individuals in most situations.
Having an additional expert in the dyad who is adequately trained improves accuracy.
Findings highlight that the extent of training received by an individual, the complexity of the task at hand, and the desired performance indicator are all critical factors that need to be accounted for when weighing up the benefits of collective decision-making.
arXiv Detail & Related papers (2020-09-23T10:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.