Visual Validation versus Visual Estimation: A Study on the Average Value
in Scatterplots
- URL: http://arxiv.org/abs/2307.09330v3
- Date: Tue, 2 Jan 2024 08:29:04 GMT
- Title: Visual Validation versus Visual Estimation: A Study on the Average Value
in Scatterplots
- Authors: Daniel Braun, Ashley Suh, Remco Chang, Michael Gleicher, Tatiana von
Landesberger
- Abstract summary: We investigate the ability of individuals to visually validate statistical models in terms of their fit to the data.
It is unknown how well people are able to visually validate models, and how their performance compares to visual and computational estimation.
- Score: 11.15435671066952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the ability of individuals to visually validate statistical
models in terms of their fit to the data. While visual model estimation has
been studied extensively, visual model validation remains under-investigated.
It is unknown how well people are able to visually validate models, and how
their performance compares to visual and computational estimation. As a
starting point, we conducted a study across two populations (crowdsourced and
volunteers). Participants had to both visually estimate (i.e, draw) and
visually validate (i.e., accept or reject) the frequently studied model of
averages. Across both populations, the level of accuracy of the models that
were considered valid was lower than the accuracy of the estimated models. We
find that participants' validation and estimation were unbiased. Moreover,
their natural critical point between accepting and rejecting a given mean value
is close to the boundary of its 95% confidence interval, indicating that the
visually perceived confidence interval corresponds to a common statistical
standard. Our work contributes to the understanding of visual model validation
and opens new research opportunities.
Related papers
- Trust Your Gut: Comparing Human and Machine Inference from Noisy Visualizations [7.305342793164905]
We investigate scenarios where human intuition might surpass idealized statistical rationality.
Our findings suggest that analyst gut reactions to visualizations may provide an advantage, even when departing from rationality.
arXiv Detail & Related papers (2024-07-23T22:39:57Z) - Beware of Validation by Eye: Visual Validation of Linear Trends in Scatterplots [10.692984164096574]
The level of accuracy for visual estimation of slope is higher than for visual validation of slope.
We found bias toward slopes that are "too steep" in both cases.
In the second experiment, we investigated whether incorporating common designs for regression visualization would improve visual validation.
arXiv Detail & Related papers (2024-07-16T11:41:24Z) - Evaluating Perceptual Distance Models by Fitting Binomial Distributions to Two-Alternative Forced Choice Data [47.18802526899955]
Crowd-sourced perceptual datasets have emerged, with no images shared between triplets, making ranking infeasible.
We statistically model the underlying decision-making process during 2AFC experiments using a binomial distribution.
We calculate meaningful and well-founded metrics for the distance model, beyond the mere prediction accuracy as percentage agreement.
arXiv Detail & Related papers (2024-03-15T15:21:04Z) - Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals [91.59906995214209]
We propose a new evaluation method, Counterfactual Attentiveness Test (CAT)
CAT uses counterfactuals by replacing part of the input with its counterpart from a different example, expecting an attentive model to change its prediction.
We show that GPT3 becomes less attentive with an increased number of demonstrations, while its accuracy on the test data improves.
arXiv Detail & Related papers (2023-11-16T06:27:35Z) - Toward Generalizable Machine Learning Models in Speech, Language, and
Hearing Sciences: Estimating Sample Size and Reducing Overfitting [1.8416014644193064]
This study uses Monte Carlo simulations to quantify the interactions between the employed cross-validation method and the discnative power of features.
The required sample size with a single holdout could be 50% higher than what would be needed if nested crossvalidation were used.
arXiv Detail & Related papers (2023-08-22T05:14:42Z) - Bootstrapping the Cross-Validation Estimate [3.5159221757909656]
Cross-validation is a widely used technique for evaluating the performance of prediction models.
It is essential to accurately quantify the uncertainty associated with the estimate.
This paper proposes a fast bootstrap method that quickly estimates the standard error of the cross-validation estimate.
arXiv Detail & Related papers (2023-07-01T07:50:54Z) - Effective Robustness against Natural Distribution Shifts for Models with
Different Training Data [113.21868839569]
"Effective robustness" measures the extra out-of-distribution robustness beyond what can be predicted from the in-distribution (ID) performance.
We propose a new evaluation metric to evaluate and compare the effective robustness of models trained on different data.
arXiv Detail & Related papers (2023-02-02T19:28:41Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Unravelling the Effect of Image Distortions for Biased Prediction of
Pre-trained Face Recognition Models [86.79402670904338]
We evaluate the performance of four state-of-the-art deep face recognition models in the presence of image distortions.
We have observed that image distortions have a relationship with the performance gap of the model across different subgroups.
arXiv Detail & Related papers (2021-08-14T16:49:05Z) - Plinko: A Theory-Free Behavioral Measure of Priors for Statistical
Learning and Mental Model Updating [62.997667081978825]
We present three experiments using "Plinko", a behavioral task in which participants estimate distributions of ball drops over all available outcomes.
We show that participant priors cluster around prototypical probability distributions and that prior cluster membership may indicate learning ability.
We verify that individual participant priors are reliable representations and that learning is not impeded when faced with a physically implausible ball drop distribution.
arXiv Detail & Related papers (2021-07-23T22:27:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.