rTisane: Externalizing conceptual models for data analysis increases
engagement with domain knowledge and improves statistical model quality
- URL: http://arxiv.org/abs/2310.16262v1
- Date: Wed, 25 Oct 2023 00:32:52 GMT
- Title: rTisane: Externalizing conceptual models for data analysis increases
engagement with domain knowledge and improves statistical model quality
- Authors: Eunice Jun, Edward Misback, Jeffrey Heer, Ren\'e Just
- Abstract summary: Statistical models should accurately reflect analysts' domain knowledge about variables and their relationships.
Recent tools let analysts express these assumptions and use them to produce a resulting statistical model.
It remains unclear what analysts want to express and how externalization impacts statistical model quality.
- Score: 11.156807472212165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Statistical models should accurately reflect analysts' domain knowledge about
variables and their relationships. While recent tools let analysts express
these assumptions and use them to produce a resulting statistical model, it
remains unclear what analysts want to express and how externalization impacts
statistical model quality. This paper addresses these gaps. We first conduct an
exploratory study of analysts using a domain-specific language (DSL) to express
conceptual models. We observe a preference for detailing how variables relate
and a desire to allow, and then later resolve, ambiguity in their conceptual
models. We leverage these findings to develop rTisane, a DSL for expressing
conceptual models augmented with an interactive disambiguation process. In a
controlled evaluation, we find that rTisane's DSL helps analysts engage more
deeply with and accurately externalize their assumptions. rTisane also leads to
statistical models that match analysts' assumptions, maintain analysis intent,
and better fit the data.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - XForecast: Evaluating Natural Language Explanations for Time Series Forecasting [72.57427992446698]
Time series forecasting aids decision-making, especially for stakeholders who rely on accurate predictions.
Traditional explainable AI (XAI) methods, which underline feature or temporal importance, often require expert knowledge.
evaluating forecast NLEs is difficult due to the complex causal relationships in time series data.
arXiv Detail & Related papers (2024-10-18T05:16:39Z) - Reliability and Interpretability in Science and Deep Learning [0.0]
This article focuses on the comparison between traditional scientific models and Deep Neural Network (DNN) models.
It argues that the high complexity of DNN models hinders the estimate of their reliability and also their prospect of long-term progress.
It also clarifies how interpretability is a precondition for assessing the reliability of any model, which cannot be based on statistical analysis alone.
arXiv Detail & Related papers (2024-01-14T20:14:07Z) - Incorporating Domain Knowledge in Deep Neural Networks for Discrete
Choice Models [0.5801044612920815]
This paper proposes a framework that expands the potential of data-driven approaches for DCM.
It includes pseudo data samples that represent required relationships and a loss function that measures their fulfillment.
A case study demonstrates the potential of this framework for discrete choice analysis.
arXiv Detail & Related papers (2023-05-30T12:53:55Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - A Prescriptive Learning Analytics Framework: Beyond Predictive Modelling
and onto Explainable AI with Prescriptive Analytics and ChatGPT [0.0]
This study proposes a novel framework that unifies both transparent machine learning as well as techniques for enabling prescriptive analytics.
This work practically demonstrates the proposed framework using predictive models for identifying at-risk learners of programme non-completion.
arXiv Detail & Related papers (2022-08-31T00:57:17Z) - A Visual Analytics System for Improving Attention-based Traffic
Forecasting Models [25.975369237248316]
We develop a visual analytics system that enables users to explore how deep learning models make predictions.
The system incorporates dynamic time warping (DTW) and Granger causality tests showing for computational-temporal dependency analysis.
We present three case studies of how AttnAnalyzer can effectively explore model behaviors and improve model performance.
arXiv Detail & Related papers (2022-08-08T18:15:40Z) - Measuring Causal Effects of Data Statistics on Language Model's
`Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models.
We provide a language for describing how training data influences predictions, through a causal framework.
Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z) - Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models.
We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z) - Using Shape Metrics to Describe 2D Data Points [0.0]
We propose to use shape metrics to describe 2D data to help make analyses more explainable and interpretable.
This is particularly important in applications in the medical community where the right to explainability' is crucial.
arXiv Detail & Related papers (2022-01-27T23:28:42Z) - A comprehensive comparative evaluation and analysis of Distributional
Semantic Models [61.41800660636555]
We perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT.
The results show that the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous.
We borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models.
arXiv Detail & Related papers (2021-05-20T15:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.