Wisdom of collaborators: a peer-review approach to performance appraisal
- URL: http://arxiv.org/abs/1912.12861v1
- Date: Mon, 30 Dec 2019 09:23:51 GMT
- Title: Wisdom of collaborators: a peer-review approach to performance appraisal
- Authors: Sofia Dokuka, Ivan Zaikin, Kate Furman, Maksim Tsvetovat and Alex
Furman
- Abstract summary: We propose a novel metric, the Peer Rank Score (PRS), that evaluates individual reputations and the non-quantifiable individual impact.
PRS is based on pairwise comparisons of employees.
We show high robustness of the algorithm on simulations and empirically validate it for a genetic testing company on more than one thousand employees.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Individual performance and reputation within a company are major factors that
influence wage distribution, promotion and firing. Due to the complexity and
collaborative nature of contemporary business processes, the evaluation of
individual impact in the majority of organizations is an ambiguous and
non-trivial task. Existing performance appraisal approaches are often affected
by individuals biased judgements, and organizations are dissatisfied with the
results of evaluations. We assert that employees can provide accurate
measurement of their peer performance in a complex collaborative environment.
We propose a novel metric, the Peer Rank Score (PRS), that evaluates individual
reputations and the non-quantifiable individual impact. PRS is based on
pairwise comparisons of employees. We show high robustness of the algorithm on
simulations and empirically validate it for a genetic testing company on more
than one thousand employees using peer reviews over the course of three years.
Related papers
- (De)Noise: Moderating the Inconsistency Between Human Decision-Makers [15.291993233528526]
We study whether algorithmic decision aids can be used to moderate the degree of inconsistency in human decision-making in the context of real estate appraisal.
We find that both (i) asking respondents to review their estimates in a series of algorithmically chosen pairwise comparisons and (ii) providing respondents with traditional machine advice are effective strategies for influencing human responses.
arXiv Detail & Related papers (2024-07-15T20:24:36Z) - 360$^\circ$REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System [71.96888731208838]
We argue that a comprehensive evaluation and accumulating experience from evaluation feedback is an effective approach to improving system performance.
We propose Reusable Experience Accumulation with 360$circ$ Assessment (360$circ$REA), a hierarchical multi-agent framework inspired by corporate organizational practices.
arXiv Detail & Related papers (2024-04-08T14:43:13Z) - Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators [48.54465599914978]
Large Language Models (LLMs) have demonstrated promising capabilities in assessing the quality of generated natural language.
LLMs still exhibit biases in evaluation and often struggle to generate coherent evaluations that align with human assessments.
We introduce Pairwise-preference Search (PairS), an uncertainty-guided search method that employs LLMs to conduct pairwise comparisons and efficiently ranks candidate texts.
arXiv Detail & Related papers (2024-03-25T17:11:28Z) - Individualized Policy Evaluation and Learning under Clustered Network
Interference [4.560284382063488]
We consider the problem of evaluating and learning an optimal individualized treatment rule under clustered network interference.
We propose an estimator that can be used to evaluate the empirical performance of an ITR.
We derive the finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the improved performance of learned policies.
arXiv Detail & Related papers (2023-11-04T17:58:24Z) - Collaborative Evaluation: Exploring the Synergy of Large Language Models
and Humans for Open-ended Generation Evaluation [71.76872586182981]
Large language models (LLMs) have emerged as a scalable and cost-effective alternative to human evaluations.
We propose a Collaborative Evaluation pipeline CoEval, involving the design of a checklist of task-specific criteria and the detailed evaluation of texts.
arXiv Detail & Related papers (2023-10-30T17:04:35Z) - ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models.
Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z) - Measuring the Effect of Influential Messages on Varying Personas [67.1149173905004]
We present a new task, Response Forecasting on Personas for News Media, to estimate the response a persona might have upon seeing a news message.
The proposed task not only introduces personalization in the modeling but also predicts the sentiment polarity and intensity of each response.
This enables more accurate and comprehensive inference on the mental state of the persona.
arXiv Detail & Related papers (2023-05-25T21:01:00Z) - Homophily and Incentive Effects in Use of Algorithms [17.55279695774825]
We present a crowdsourcing vignette study designed to assess the impacts of two plausible factors on AI-informed decision-making.
First, we examine homophily -- do people defer more to models that tend to agree with them?
Second, we consider incentives -- how do people incorporate a (known) cost structure in the hybrid decision-making setting?
arXiv Detail & Related papers (2022-05-19T17:11:04Z) - Improving Peer Assessment with Graph Convolutional Networks [2.105564340986074]
Peer assessment might not be as accurate as expert evaluations, thus rendering these systems unreliable.
We first model peer assessment as multi-relational weighted networks that can express a variety of peer assessment setups.
We introduce a graph convolutional network which can learn assessment patterns and user behaviors to more accurately predict expert evaluations.
arXiv Detail & Related papers (2021-11-04T03:43:09Z) - Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment [61.24399136715106]
We consider the issue of strategic behaviour in various peer-assessment tasks, including peer grading of exams or homeworks and peer review in hiring or promotions.
Our focus is on designing methods for detection of such manipulations.
Specifically, we consider a setting in which agents evaluate a subset of their peers and output rankings that are later aggregated to form a final ordering.
arXiv Detail & Related papers (2020-10-08T15:08:40Z) - The cost of coordination can exceed the benefit of collaboration in
performing complex tasks [0.0]
dyads gradually improve in performance but do not experience a collective benefit compared to individuals in most situations.
Having an additional expert in the dyad who is adequately trained improves accuracy.
Findings highlight that the extent of training received by an individual, the complexity of the task at hand, and the desired performance indicator are all critical factors that need to be accounted for when weighing up the benefits of collective decision-making.
arXiv Detail & Related papers (2020-09-23T10:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.