A Visual Analytics Approach to Building Logistic Regression Models and
its Application to Health Records
- URL: http://arxiv.org/abs/2201.08429v1
- Date: Thu, 20 Jan 2022 19:53:41 GMT
- Title: A Visual Analytics Approach to Building Logistic Regression Models and
its Application to Health Records
- Authors: Erasmo Artur and Rosane Minghim
- Abstract summary: We present an open unified approach for generating, evaluating, and applying regression models in high-dimensional data sets.
The approach is based on exposing a broad correlation panorama for attributes, by which the user can select relevant attributes to build and evaluate prediction models.
We demonstrate effectiveness and efficiency of UCReg through the application of our framework to the analysis of Covid-19 and other synthetic and real health records data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multidimensional data analysis has become increasingly important in many
fields, mainly due to current vast data availability and the increasing demand
to extract knowledge from it. In most applications, the role of the final user
is crucial to build proper machine learning models and to explain the patterns
found in data. In this paper, we present an open unified approach for
generating, evaluating, and applying regression models in high-dimensional data
sets within a user-guided process. The approach is based on exposing a broad
correlation panorama for attributes, by which the user can select relevant
attributes to build and evaluate prediction models for one or more contexts. We
name the approach UCReg (User-Centered Regression). We demonstrate
effectiveness and efficiency of UCReg through the application of our framework
to the analysis of Covid-19 and other synthetic and real health records data.
Related papers
- A Statistical Framework for Data-dependent Retrieval-Augmented Models [46.781026675083254]
Modern ML systems increasingly augment input instances with additional relevant information to enhance final prediction.
We study such models with two components: 1) a em retriever to identify the relevant information out of a large corpus via a data-dependent metric; and 2) a em predictor that consumes the input instances along with the retrieved information to make the final predictions.
arXiv Detail & Related papers (2024-08-27T20:51:06Z) - Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.
Existing approaches require re-training models on different data subsets, which is computationally intensive.
This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z) - Dataset Regeneration for Sequential Recommendation [69.93516846106701]
We propose a data-centric paradigm for developing an ideal training dataset using a model-agnostic dataset regeneration framework called DR4SR.
To demonstrate the effectiveness of the data-centric paradigm, we integrate our framework with various model-centric methods and observe significant performance improvements across four widely adopted datasets.
arXiv Detail & Related papers (2024-05-28T03:45:34Z) - IGANN Sparse: Bridging Sparsity and Interpretability with Non-linear Insight [4.010646933005848]
IGANN Sparse is a novel machine learning model from the family of generalized additive models.
It promotes sparsity through a non-linear feature selection process during training.
This ensures interpretability through improved model sparsity without sacrificing predictive performance.
arXiv Detail & Related papers (2024-03-17T22:44:36Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - AttributionScanner: A Visual Analytics System for Model Validation with Metadata-Free Slice Finding [29.07617945233152]
Data slice finding is an emerging technique for validating machine learning (ML) models by identifying and analyzing subgroups in a dataset that exhibit poor performance.
This approach faces significant challenges, including the laborious and costly requirement for additional metadata.
We introduce AttributionScanner, an innovative human-in-the-loop Visual Analytics (VA) system, designed for metadata-free data slice finding.
Our system identifies interpretable data slices that involve common model behaviors and visualizes these patterns through an Attribution Mosaic design.
arXiv Detail & Related papers (2024-01-12T09:17:32Z) - TRIAGE: Characterizing and auditing training data for improved
regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors.
TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score.
We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z) - Multidimensional Item Response Theory in the Style of Collaborative
Filtering [0.8057006406834467]
This paper presents a machine learning approach to multidimensional item response theory (MIRT)
Inspired by collaborative filtering, we define a general class of models that includes many MIRT models.
We discuss the use of penalized joint maximum likelihood (JML) to estimate individual models and cross-validation to select the best performing model.
arXiv Detail & Related papers (2023-01-03T00:56:27Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Topology-based Clusterwise Regression for User Segmentation and Demand
Forecasting [63.78344280962136]
Using a public and a novel proprietary data set of commercial data, this research shows that the proposed system enables analysts to both cluster their user base and plan demand at a granular level.
This work seeks to introduce TDA-based clustering of time series and clusterwise regression with matrix factorization methods as viable tools for the practitioner.
arXiv Detail & Related papers (2020-09-08T12:10:10Z) - Predicting Multidimensional Data via Tensor Learning [0.0]
We develop a model that retains the intrinsic multidimensional structure of the dataset.
To estimate the model parameters, an Alternating Least Squares algorithm is developed.
The proposed model is able to outperform benchmark models present in the forecasting literature.
arXiv Detail & Related papers (2020-02-11T11:57:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.