AI for human assessment: What do professional assessors need?
- URL: http://arxiv.org/abs/2204.08471v1
- Date: Mon, 18 Apr 2022 03:35:37 GMT
- Title: AI for human assessment: What do professional assessors need?
- Authors: Riku Arakawa, Hiromu Yakura
- Abstract summary: This case study aims to help professional assessors make decisions in human assessment, in which they conduct interviews with assessees and evaluate their suitability for certain job roles.
A computational system that can extract nonverbal cues of assesses would be beneficial to assessors in terms of supporting their decision making.
We developed such a system based on an unsupervised anomaly detection algorithm using multimodal behavioral features such as facial keypoints, pose, head pose, and gaze.
- Score: 33.88509725285237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present our case study that aims to help professional assessors make
decisions in human assessment, in which they conduct interviews with assessees
and evaluate their suitability for certain job roles. Our workshop with two
industrial assessors revealed that a computational system that can extract
nonverbal cues of assesses from interview videos would be beneficial to
assessors in terms of supporting their decision making. In response, we
developed such a system based on an unsupervised anomaly detection algorithm
using multimodal behavioral features such as facial keypoints, pose, head pose,
and gaze. Moreover, we enabled the system to output how much each feature
contributed to the outlierness of the detected cues with the purpose of
enhancing its interpretability. We then conducted a preliminary study to
examine the validity of the system's output by using 20 actual assessment
interview videos and involving the two assessors. The results suggested the
advantages of using unsupervised anomaly detection in an interpretable manner
by illustrating the informativeness of its outputs for assessors. Our approach,
which builds on top of the idea of separation of observation and interpretation
in human-AI teaming, will facilitate human decision making in highly contextual
domains, such as human assessment, while keeping their trust in the system.
Related papers
- Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - Evaluating General-Purpose AI with Psychometrics [43.85432514910491]
We discuss the need for a comprehensive and accurate evaluation of general-purpose AI systems such as large language models.
Current evaluation methodology, mostly based on benchmarks of specific tasks, falls short of adequately assessing these versatile AI systems.
To tackle these challenges, we suggest transitioning from task-oriented evaluation to construct-oriented evaluation.
arXiv Detail & Related papers (2023-10-25T05:38:38Z) - Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments.
It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Opportunities of a Machine Learning-based Decision Support System for
Stroke Rehabilitation Assessment [64.52563354823711]
Rehabilitation assessment is critical to determine an adequate intervention for a patient.
Current practices of assessment mainly rely on therapist's experience, and assessment is infrequently executed due to the limited availability of a therapist.
We developed an intelligent decision support system that can identify salient features of assessment using reinforcement learning.
arXiv Detail & Related papers (2020-02-27T17:04:07Z) - What's a Good Prediction? Challenges in evaluating an agent's knowledge [0.9281671380673306]
We show the conflict between accuracy and usefulness of general knowledge.
We propose an alternate evaluation approach that arises continually in the online continual learning setting.
This paper contributes a first look into evaluation of predictions through their use.
arXiv Detail & Related papers (2020-01-23T21:44:43Z) - Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating
Explainable AI Systems [14.940404609343432]
We evaluate two currently common techniques for evaluating XAI systems.
We show that evaluations with proxy tasks did not predict the results of the evaluations with the actual decision-making tasks.
Our results suggest that by employing misleading evaluation methods, our field may be inadvertently slowing its progress toward developing human+AI teams that can reliably perform better than humans or AIs alone.
arXiv Detail & Related papers (2020-01-22T22:14:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.