Why we should respect analysis results as data
- URL: http://arxiv.org/abs/2204.09959v1
- Date: Thu, 21 Apr 2022 08:34:07 GMT
- Title: Why we should respect analysis results as data
- Authors: Joana M Barros, Lukas A Widmer, Mark Baillie, Simon Wandel
- Abstract summary: It is commonly overlooked that analyzing clinical study data also produces data in the form of results.
Although integrating and putting findings into context is a cornerstone of scientific work, analysis results are often neglected as a data source.
We propose a solution to "calculate once, use many times" by combining analysis results standards with a common data model.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The development and approval of new treatments generates large volumes of
results, such as summaries of efficacy and safety. However, it is commonly
overlooked that analyzing clinical study data also produces data in the form of
results. For example, descriptive statistics and model predictions are data.
Although integrating and putting findings into context is a cornerstone of
scientific work, analysis results are often neglected as a data source. Results
end up stored as "data products" such as PDF documents that are not machine
readable or amenable to future analysis. We propose a solution to "calculate
once, use many times" by combining analysis results standards with a common
data model. This analysis results data model re-frames the target of analyses
from static representations of the results (e.g., tables and figures) to a data
model with applications in various contexts, including knowledge discovery.
Further, we provide a working proof of concept detailing how to approach
analyses standardization and construct a schema to store and query analysis
results.
Related papers
- Bayesian Federated Inference for Survival Models [0.0]
In cancer research, overall survival and progression free survival are often analyzed with the Cox model.
Merging data sets from different medical centers may help, but this is not always possible due to strict privacy legislation and logistic difficulties.
Recently, the Bayesian Federated Inference (BFI) strategy for generalized linear models was proposed.
arXiv Detail & Related papers (2024-04-26T15:05:26Z) - Fusion of Gaussian Processes Predictions with Monte Carlo Sampling [61.31380086717422]
In science and engineering, we often work with models designed for accurate prediction of variables of interest.
Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes.
arXiv Detail & Related papers (2024-03-03T04:21:21Z) - Text2Analysis: A Benchmark of Table Question Answering with Advanced
Data Analysis and Unclear Queries [67.0083902913112]
We develop the Text2Analysis benchmark, incorporating advanced analysis tasks.
We also develop five innovative and effective annotation methods.
We evaluate five state-of-the-art models using three different metrics.
arXiv Detail & Related papers (2023-12-21T08:50:41Z) - Using causal inference to avoid fallouts in data-driven parametric
analysis: a case study in the architecture, engineering, and construction
industry [0.7566148383213173]
The decision-making process in real-world implementations has been affected by a growing reliance on data-driven models.
We investigated the synergetic pattern between the data-driven methods, empirical domain knowledge, and first-principles simulations.
arXiv Detail & Related papers (2023-09-11T13:54:58Z) - Utility Assessment of Synthetic Data Generation Methods [0.0]
We investigate whether different methods of generating fully synthetic data vary in their utility a priori.
We find some methods to perform better than others across the board.
We do get promising findings for classification tasks when using synthetic data for training machine learning models.
arXiv Detail & Related papers (2022-11-23T11:09:52Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Similarities and Differences between Machine Learning and Traditional
Advanced Statistical Modeling in Healthcare Analytics [0.6999740786886537]
Machine learning and statistical modeling are complementary, based on similar mathematical principles.
Good analysts and data scientists should be well versed in both techniques and their proper application.
arXiv Detail & Related papers (2022-01-07T14:36:46Z) - Hidden Biases in Unreliable News Detection Datasets [60.71991809782698]
We show that selection bias during data collection leads to undesired artifacts in the datasets.
We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap.
We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.
arXiv Detail & Related papers (2021-04-20T17:16:41Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Categorical exploratory data analysis on goodness-of-fit issues [0.6091702876917279]
We propose to utilize the data analysis paradigm called Categorical Exploratory Data Analysis (CEDA)
CEDA brings out where and how each data fits or deviates from the model shape via several important distributional aspects.
We make graphic display to illuminate the advantages of using CEDA as one primary way of data analysis in Data Science education.
arXiv Detail & Related papers (2020-11-19T06:11:06Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.