LCS-DIVE: An Automated Rule-based Machine Learning Visualization
Pipeline for Characterizing Complex Associations in Classification
- URL: http://arxiv.org/abs/2104.12844v1
- Date: Mon, 26 Apr 2021 19:47:03 GMT
- Title: LCS-DIVE: An Automated Rule-based Machine Learning Visualization
Pipeline for Characterizing Complex Associations in Classification
- Authors: Robert Zhang, Rachael Stolzenberg-Solomon, Shannon M. Lynch, Ryan J.
Urbanowicz
- Abstract summary: This work introduces the LCS Discovery Visualization Environment (LCS-DIVE), an automated LCS interpretation pipeline for complex biomedical classification.
LCS-DIVE conducts modeling using a new scikit-learn implementation of ExSTraCS, an LCS designed to overcome noise and scalability in biomedical data mining.
It leverages feature-tracking scores and/or rules to automatically guide characterization of (1) feature importance (2) underlying additive, epistatic, and/or heterogeneous patterns of association, and (3) model-driven heterogeneous subgroups via clustering, visualization generation, and cluster interrogation.
- Score: 0.7226144684379191
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) research has yielded powerful tools for training
accurate prediction models despite complex multivariate associations (e.g.
interactions and heterogeneity). In fields such as medicine, improved
interpretability of ML modeling is required for knowledge discovery,
accountability, and fairness. Rule-based ML approaches such as Learning
Classifier Systems (LCSs) strike a balance between predictive performance and
interpretability in complex, noisy domains. This work introduces the LCS
Discovery and Visualization Environment (LCS-DIVE), an automated LCS model
interpretation pipeline for complex biomedical classification. LCS-DIVE
conducts modeling using a new scikit-learn implementation of ExSTraCS, an LCS
designed to overcome noise and scalability in biomedical data mining yielding
human readable IF:THEN rules as well as feature-tracking scores for each
training sample. LCS-DIVE leverages feature-tracking scores and/or rules to
automatically guide characterization of (1) feature importance (2) underlying
additive, epistatic, and/or heterogeneous patterns of association, and (3)
model-driven heterogeneous instance subgroups via clustering, visualization
generation, and cluster interrogation. LCS-DIVE was evaluated over a diverse
set of simulated genetic and benchmark datasets encoding a variety of complex
multivariate associations, demonstrating its ability to differentiate between
them and then applied to characterize associations within a real-world study of
pancreatic cancer.
Related papers
- In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - An Ensemble Approach to Question Classification: Integrating Electra
Transformer, GloVe, and LSTM [0.0]
This study presents an innovative ensemble approach for question classification, combining the strengths of Electra, GloVe, and LSTM models.
Rigorously tested on the well-regarded TREC dataset, the model demonstrates how the integration of these disparate technologies can lead to superior results.
arXiv Detail & Related papers (2023-08-13T18:14:10Z) - Extension of Transformational Machine Learning: Classification Problems [0.0]
This study explores the application and performance of Transformational Machine Learning (TML) in drug discovery.
TML, a meta learning algorithm, excels in exploiting common attributes across various domains.
The drug discovery process, which is complex and time-consuming, can benefit greatly from the enhanced prediction accuracy.
arXiv Detail & Related papers (2023-08-07T07:34:18Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence
Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio.
We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - Using Representation Expressiveness and Learnability to Evaluate
Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability.
CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means.
We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z) - Development of Interpretable Machine Learning Models to Detect
Arrhythmia based on ECG Data [0.0]
This thesis builds Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) classifiers based on state-of-the-art models.
Both global and local interpretability methods are exploited to understand the interaction between dependent and independent variables.
It was found that Grad-Cam was the most effective interpretability technique at explaining predictions of proposed CNN and LSTM models.
arXiv Detail & Related papers (2022-05-05T17:29:33Z) - BenchML: an extensible pipelining framework for benchmarking
representations of materials and molecules at scale [0.0]
We introduce a machine-learning framework for benchmarking representations of chemical systems against datasets of materials and molecules.
The guiding principle is to evaluate raw descriptor performance by limiting model complexity to simple regression schemes.
The resulting models are intended as baselines that can inform future method development.
arXiv Detail & Related papers (2021-12-04T09:07:16Z) - A Rigorous Machine Learning Analysis Pipeline for Biomedical Binary
Classification: Application in Pancreatic Cancer Nested Case-control Studies
with Implications for Bias Assessments [2.9726886415710276]
We have laid out and assembled a complete, rigorous ML analysis pipeline focused on binary classification.
This 'automated' but customizable pipeline includes a) exploratory analysis, b) data cleaning and transformation, c) feature selection, d) model training with 9 established ML algorithms.
We apply this pipeline to an epidemiological investigation of established and newly identified risk factors for cancer to evaluate how different sources of bias might be handled by ML algorithms.
arXiv Detail & Related papers (2020-08-28T19:58:05Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.