Are Metrics Enough? Guidelines for Communicating and Visualizing
Predictive Models to Subject Matter Experts
- URL: http://arxiv.org/abs/2205.05749v2
- Date: Mon, 27 Mar 2023 21:07:47 GMT
- Title: Are Metrics Enough? Guidelines for Communicating and Visualizing
Predictive Models to Subject Matter Experts
- Authors: Ashley Suh, Gabriel Appleby, Erik W. Anderson, Luca Finelli, Remco
Chang, Dylan Cashman
- Abstract summary: We describe an iterative study conducted with both subject matter experts and data scientists to understand the gaps in communication.
We derive a set of communication guidelines that use visualization as a common medium for communicating the strengths and weaknesses of a model.
- Score: 7.768301998812552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Presenting a predictive model's performance is a communication bottleneck
that threatens collaborations between data scientists and subject matter
experts. Accuracy and error metrics alone fail to tell the whole story of a
model - its risks, strengths, and limitations - making it difficult for subject
matter experts to feel confident in their decision to use a model. As a result,
models may fail in unexpected ways or go entirely unused, as subject matter
experts disregard poorly presented models in favor of familiar, yet arguably
substandard methods. In this paper, we describe an iterative study conducted
with both subject matter experts and data scientists to understand the gaps in
communication between these two groups. We find that, while the two groups
share common goals of understanding the data and predictions of the model,
friction can stem from unfamiliar terms, metrics, and visualizations - limiting
the transfer of knowledge to SMEs and discouraging clarifying questions being
asked during presentations. Based on our findings, we derive a set of
communication guidelines that use visualization as a common medium for
communicating the strengths and weaknesses of a model. We provide a
demonstration of our guidelines in a regression modeling scenario and elicit
feedback on their use from subject matter experts. From our demonstration,
subject matter experts were more comfortable discussing a model's performance,
more aware of the trade-offs for the presented model, and better equipped to
assess the model's risks - ultimately informing and contextualizing the model's
use beyond text and numbers.
Related papers
- Context versus Prior Knowledge in Language Models [49.17879668110546]
Language models often need to integrate prior knowledge learned during pretraining and new information presented in context.
We propose two mutual information-based metrics to measure a model's dependency on a context and on its prior about an entity.
arXiv Detail & Related papers (2024-04-06T13:46:53Z) - Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Interactive Model Cards: A Human-Centered Approach to Model
Documentation [20.880991026743498]
Deep learning models for natural language processing are increasingly adopted and deployed by analysts without formal training in NLP or machine learning.
The documentation intended to convey the model's details and appropriate use is tailored primarily to individuals with ML or NLP expertise.
We conduct a design inquiry into interactive model cards, which augment traditionally static model cards with affordances for exploring model documentation and interacting with the models themselves.
arXiv Detail & Related papers (2022-05-05T19:19:28Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Representations of epistemic uncertainty and awareness in data-driven
strategies [0.0]
We present a theoretical model for uncertainty in knowledge representation and its transfer mediated by agents.
We look at inequivalent knowledge representations in terms of inferences, preference relations, and information measures.
We discuss some implications of the proposed model for data-driven strategies.
arXiv Detail & Related papers (2021-10-21T21:18:21Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.