Beware of Validation by Eye: Visual Validation of Linear Trends in Scatterplots
- URL: http://arxiv.org/abs/2407.11625v2
- Date: Fri, 6 Sep 2024 07:56:21 GMT
- Title: Beware of Validation by Eye: Visual Validation of Linear Trends in Scatterplots
- Authors: Daniel Braun, Remco Chang, Michael Gleicher, Tatiana von Landesberger,
- Abstract summary: The level of accuracy for visual estimation of slope is higher than for visual validation of slope.
We found bias toward slopes that are "too steep" in both cases.
In the second experiment, we investigated whether incorporating common designs for regression visualization would improve visual validation.
- Score: 10.692984164096574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual validation of regression models in scatterplots is a common practice for assessing model quality, yet its efficacy remains unquantified. We conducted two empirical experiments to investigate individuals' ability to visually validate linear regression models (linear trends) and to examine the impact of common visualization designs on validation quality. The first experiment showed that the level of accuracy for visual estimation of slope (i.e., fitting a line to data) is higher than for visual validation of slope (i.e., accepting a shown line). Notably, we found bias toward slopes that are "too steep" in both cases. This lead to novel insights that participants naturally assessed regression with orthogonal distances between the points and the line (i.e., ODR regression) rather than the common vertical distances (OLS regression). In the second experiment, we investigated whether incorporating common designs for regression visualization (error lines, bounding boxes, and confidence intervals) would improve visual validation. Even though error lines reduced validation bias, results failed to show the desired improvements in accuracy for any design. Overall, our findings suggest caution in using visual model validation for linear trends in scatterplots.
Related papers
- Automated Assessment of Residual Plots with Computer Vision Models [5.835976576278297]
Plotting residuals is a recommended procedure to diagnose deviations from linear model assumptions.
The presence of structure in residual plots can be tested using the lineup protocol to do visual inference.
This work presents a solution by providing a computer vision model to automate the assessment of residual plots.
arXiv Detail & Related papers (2024-11-01T19:51:44Z) - Failures and Successes of Cross-Validation for Early-Stopped Gradient
Descent [8.0225129190882]
We analyze the statistical properties of generalized cross-validation (GCV) and leave-one-out cross-validation (LOOCV) applied to early-stopped descent gradient (GD)
We prove that GCV is generically inconsistent as an estimator of the prediction risk of early-stopped GD, even for a well-specified linear model with isotropic features.
Our theory requires only mild assumptions on the data distribution and does not require the underlying regression function to be linear.
arXiv Detail & Related papers (2024-02-26T18:07:27Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Graph Out-of-Distribution Generalization with Controllable Data
Augmentation [51.17476258673232]
Graph Neural Network (GNN) has demonstrated extraordinary performance in classifying graph properties.
Due to the selection bias of training and testing data, distribution deviation is widespread.
We propose OOD calibration to measure the distribution deviation of virtual samples.
arXiv Detail & Related papers (2023-08-16T13:10:27Z) - Visual Validation versus Visual Estimation: A Study on the Average Value
in Scatterplots [11.15435671066952]
We investigate the ability of individuals to visually validate statistical models in terms of their fit to the data.
It is unknown how well people are able to visually validate models, and how their performance compares to visual and computational estimation.
arXiv Detail & Related papers (2023-07-18T15:13:15Z) - Extracting or Guessing? Improving Faithfulness of Event Temporal
Relation Extraction [87.04153383938969]
We improve the faithfulness of TempRel extraction models from two perspectives.
The first perspective is to extract genuinely based on contextual description.
The second perspective is to provide proper uncertainty estimation.
arXiv Detail & Related papers (2022-10-10T19:53:13Z) - Certifying Data-Bias Robustness in Linear Regression [12.00314910031517]
We present a technique for certifying whether linear regression models are pointwise-robust to label bias in a training dataset.
We show how to solve this problem exactly for individual test points, and provide an approximate but more scalable method.
We also unearth gaps in bias-robustness, such as high levels of non-robustness for certain bias assumptions on some datasets.
arXiv Detail & Related papers (2022-06-07T20:47:07Z) - Recovering the Unbiased Scene Graphs from the Biased Ones [99.24441932582195]
We show that due to the missing labels, scene graph generation (SGG) can be viewed as a "Learning from Positive and Unlabeled data" (PU learning) problem.
We propose Dynamic Label Frequency Estimation (DLFE) to take advantage of training-time data augmentation and average over multiple training iterations to introduce more valid examples.
Extensive experiments show that DLFE is more effective in estimating label frequencies than a naive variant of the traditional estimate, and DLFE significantly alleviates the long tail.
arXiv Detail & Related papers (2021-07-05T16:10:41Z) - Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent.
We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z) - Identifying Statistical Bias in Dataset Replication [102.92137353938388]
We study a replication of the ImageNet dataset on which models exhibit a significant (11-14%) drop in accuracy.
After correcting for the identified statistical bias, only an estimated $3.6% pm 1.5%$ of the original $11.7% pm 1.0%$ accuracy drop remains unaccounted for.
arXiv Detail & Related papers (2020-05-19T17:48:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.