External Stability Auditing to Test the Validity of Personality
Prediction in AI Hiring
- URL: http://arxiv.org/abs/2201.09151v1
- Date: Sun, 23 Jan 2022 00:44:56 GMT
- Title: External Stability Auditing to Test the Validity of Personality
Prediction in AI Hiring
- Authors: Alene K. Rhea, Kelsey Markey, Lauren D'Arinzo, Hilke Schellmann, Mona
Sloane, Paul Squires, Julia Stoyanovich
- Abstract summary: We develop a methodology for an external audit of stability of predictions made by algorithmic personality tests.
We instantiate this methodology in an audit of two systems, Humantic AI and Crystal.
We find that both systems show substantial instability with respect to key facets of measurement.
- Score: 4.837064018590988
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Automated hiring systems are among the fastest-developing of all high-stakes
AI systems. Among these are algorithmic personality tests that use insights
from psychometric testing, and promise to surface personality traits indicative
of future success based on job seekers' resumes or social media profiles. We
interrogate the validity of such systems using stability of the outputs they
produce, noting that reliability is a necessary, but not a sufficient,
condition for validity. Our approach is to (a) develop a methodology for an
external audit of stability of predictions made by algorithmic personality
tests, and (b) instantiate this methodology in an audit of two systems,
Humantic AI and Crystal. Crucially, rather than challenging or affirming the
assumptions made in psychometric testing -- that personality is a meaningful
and measurable construct, and that personality traits are indicative of future
success on the job -- we frame our methodology around testing the underlying
assumptions made by the vendors of the algorithmic personality tests
themselves.
In our audit of Humantic AI and Crystal, we find that both systems show
substantial instability with respect to key facets of measurement, and so
cannot be considered valid testing instruments. For example, Crystal frequently
computes different personality scores if the same resume is given in PDF vs. in
raw text format, violating the assumption that the output of an algorithmic
personality test is stable across job-irrelevant variations in the input. Among
other notable findings is evidence of persistent -- and often incorrect -- data
linkage by Humantic AI.
Related papers
- Test Generation Strategies for Building Failure Models and Explaining
Spurious Failures [4.995172162560306]
Test inputs fail not only when the system under test is faulty but also when the inputs are invalid or unrealistic.
We propose to build failure models for inferring interpretable rules on test inputs that cause spurious failures.
We show that our proposed surrogate-assisted approach generates failure models with an average accuracy of 83%.
arXiv Detail & Related papers (2023-12-09T18:36:15Z) - Towards Reliable AI: Adequacy Metrics for Ensuring the Quality of
System-level Testing of Autonomous Vehicles [5.634825161148484]
We introduce a set of black-box test adequacy metrics called "Test suite Instance Space Adequacy" (TISA) metrics.
The TISA metrics offer a way to assess both the diversity and coverage of the test suite and the range of bugs detected during testing.
We evaluate the efficacy of the TISA metrics by examining their correlation with the number of bugs detected in system-level simulation testing of AVs.
arXiv Detail & Related papers (2023-11-14T10:16:05Z) - Functional trustworthiness of AI systems by statistically valid testing [7.717286312400472]
The authors are concerned about the safety, health, and rights of the European citizens due to inadequate measures and procedures required by the current draft of the EU Artificial Intelligence (AI) Act.
We observe that not only the current draft of the EU AI Act, but also the accompanying standardization efforts in CEN/CENELEC, have resorted to the position that real functional guarantees of AI systems supposedly would be unrealistic and too complex anyways.
arXiv Detail & Related papers (2023-10-04T11:07:52Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z) - Uncertainty-Driven Action Quality Assessment [67.20617610820857]
We propose a novel probabilistic model, named Uncertainty-Driven AQA (UD-AQA), to capture the diversity among multiple judge scores.
We generate the estimation of uncertainty for each prediction, which is employed to re-weight AQA regression loss.
Our proposed method achieves competitive results on three benchmarks including the Olympic events MTL-AQA and FineDiving, and the surgical skill JIGSAWS datasets.
arXiv Detail & Related papers (2022-07-29T07:21:15Z) - Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards
Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent.
Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally.
We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z) - Using Sampling to Estimate and Improve Performance of Automated Scoring
Systems with Guarantees [63.62448343531963]
We propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently.
We observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK) (25.60% on average) with a relatively small human budget.
arXiv Detail & Related papers (2021-11-17T05:00:51Z) - Towards Human-Like Automated Test Generation: Perspectives from
Cognition and Problem Solving [13.541347853480705]
We propose a framework based on cognitive science to identify cognitive processes of testers.
Our goal is to be able to mimic how humans create test cases and thus to design more human-like automated test generation systems.
arXiv Detail & Related papers (2021-03-08T13:43:55Z) - An Uncertainty-based Human-in-the-loop System for Industrial Tool Wear
Analysis [68.8204255655161]
We show that uncertainty measures based on Monte-Carlo dropout in the context of a human-in-the-loop system increase the system's transparency and performance.
A simulation study demonstrates that the uncertainty-based human-in-the-loop system increases performance for different levels of human involvement.
arXiv Detail & Related papers (2020-07-14T15:47:37Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Jointly Predicting Job Performance, Personality, Cognitive Ability,
Affect, and Well-Being [42.67003631848889]
We create a benchmark for predictive analysis of individuals from a perspective that integrates physical and physiological behavior, psychological states and traits, and job performance.
We design data mining techniques as benchmark and uses real noisy and incomplete data derived from wearable sensors to predict 19 constructs based on 12 standardized well-validated tests.
arXiv Detail & Related papers (2020-06-10T14:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.