Related papers: LCS-DIVE: An Automated Rule-based Machine Learning Visualization Pipeline for Characterizing Complex Associations in Classification

LCS-DIVE: An Automated Rule-based Machine Learning Visualization Pipeline for Characterizing Complex Associations in Classification

URL: http://arxiv.org/abs/2104.12844v1
Date: Mon, 26 Apr 2021 19:47:03 GMT
Title: LCS-DIVE: An Automated Rule-based Machine Learning Visualization Pipeline for Characterizing Complex Associations in Classification
Authors: Robert Zhang, Rachael Stolzenberg-Solomon, Shannon M. Lynch, Ryan J. Urbanowicz
Abstract summary: This work introduces the LCS Discovery Visualization Environment (LCS-DIVE), an automated LCS interpretation pipeline for complex biomedical classification. LCS-DIVE conducts modeling using a new scikit-learn implementation of ExSTraCS, an LCS designed to overcome noise and scalability in biomedical data mining. It leverages feature-tracking scores and/or rules to automatically guide characterization of (1) feature importance (2) underlying additive, epistatic, and/or heterogeneous patterns of association, and (3) model-driven heterogeneous subgroups via clustering, visualization generation, and cluster interrogation.
Score: 0.7226144684379191
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning (ML) research has yielded powerful tools for training accurate prediction models despite complex multivariate associations (e.g. interactions and heterogeneity). In fields such as medicine, improved interpretability of ML modeling is required for knowledge discovery, accountability, and fairness. Rule-based ML approaches such as Learning Classifier Systems (LCSs) strike a balance between predictive performance and interpretability in complex, noisy domains. This work introduces the LCS Discovery and Visualization Environment (LCS-DIVE), an automated LCS model interpretation pipeline for complex biomedical classification. LCS-DIVE conducts modeling using a new scikit-learn implementation of ExSTraCS, an LCS designed to overcome noise and scalability in biomedical data mining yielding human readable IF:THEN rules as well as feature-tracking scores for each training sample. LCS-DIVE leverages feature-tracking scores and/or rules to automatically guide characterization of (1) feature importance (2) underlying additive, epistatic, and/or heterogeneous patterns of association, and (3) model-driven heterogeneous instance subgroups via clustering, visualization generation, and cluster interrogation. LCS-DIVE was evaluated over a diverse set of simulated genetic and benchmark datasets encoding a variety of complex multivariate associations, demonstrating its ability to differentiate between them and then applied to characterize associations within a real-world study of pancreatic cancer.

Related papers

Efficient or Powerful? Trade-offs Between Machine Learning and Deep Learning for Mental Illness Detection on Social Media [0.036136619420474754]
Social media platforms provide valuable insights into mental health trends by capturing user-generated discussions on conditions such as depression, anxiety, and suicidal ideation. Machine learning (ML) and deep learning (DL) models have been increasingly applied to classify mental health conditions from textual data. This study evaluates multiple ML models, including logistic regression, random forest, and LightGBM, alongside deep learning architectures such as ALBERT and Gated Recurrent Units (GRUs) Our findings indicate that ML and DL models achieve comparable classification performance on medium-sized datasets.
arXiv Detail & Related papers (2025-03-03T00:51:41Z)
ISAM-MTL: Cross-subject multi-task learning model with identifiable spikes and associative memory networks [6.240145569484483]
Cross-subject variability in EEG degrades performance of current deep learning models. This paper proposes ISAM-MTL, which is a multi-task learning (MTL) EEG classification model based on identifiable spiking (IS) representations and associative memory (AM) networks.
arXiv Detail & Related papers (2025-01-30T02:00:48Z)
Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [51.316001071698224]
We introduce Biology-Instructions, the first large-scale multi-omics biological sequences-related instruction-tuning dataset. This dataset can bridge the gap between large language models (LLMs) and complex biological sequences-related tasks. We also develop a strong baseline called ChatMultiOmics with a novel three-stage training pipeline.
arXiv Detail & Related papers (2024-12-26T12:12:23Z)
In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL) We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z)
An Ensemble Approach to Question Classification: Integrating Electra Transformer, GloVe, and LSTM [0.0]
This study presents an innovative ensemble approach for question classification, combining the strengths of Electra, GloVe, and LSTM models. Rigorously tested on the well-regarded TREC dataset, the model demonstrates how the integration of these disparate technologies can lead to superior results.
arXiv Detail & Related papers (2023-08-13T18:14:10Z)
Extension of Transformational Machine Learning: Classification Problems [0.0]
This study explores the application and performance of Transformational Machine Learning (TML) in drug discovery. TML, a meta learning algorithm, excels in exploiting common attributes across various domains. The drug discovery process, which is complex and time-consuming, can benefit greatly from the enhanced prediction accuracy.
arXiv Detail & Related papers (2023-08-07T07:34:18Z)
Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation. GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z)
Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence Classification [109.81283748940696]
We introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio. We show that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences.
arXiv Detail & Related papers (2022-07-18T19:16:56Z)
GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes. It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes. We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z)
Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability. CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means. We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z)
Development of Interpretable Machine Learning Models to Detect Arrhythmia based on ECG Data [0.0]
This thesis builds Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) classifiers based on state-of-the-art models. Both global and local interpretability methods are exploited to understand the interaction between dependent and independent variables. It was found that Grad-Cam was the most effective interpretability technique at explaining predictions of proposed CNN and LSTM models.
arXiv Detail & Related papers (2022-05-05T17:29:33Z)
BenchML: an extensible pipelining framework for benchmarking representations of materials and molecules at scale [0.0]
We introduce a machine-learning framework for benchmarking representations of chemical systems against datasets of materials and molecules. The guiding principle is to evaluate raw descriptor performance by limiting model complexity to simple regression schemes. The resulting models are intended as baselines that can inform future method development.
arXiv Detail & Related papers (2021-12-04T09:07:16Z)
A Rigorous Machine Learning Analysis Pipeline for Biomedical Binary Classification: Application in Pancreatic Cancer Nested Case-control Studies with Implications for Bias Assessments [2.9726886415710276]
We have laid out and assembled a complete, rigorous ML analysis pipeline focused on binary classification. This 'automated' but customizable pipeline includes a) exploratory analysis, b) data cleaning and transformation, c) feature selection, d) model training with 9 established ML algorithms. We apply this pipeline to an epidemiological investigation of established and newly identified risk factors for cancer to evaluate how different sources of bias might be handled by ML algorithms.
arXiv Detail & Related papers (2020-08-28T19:58:05Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.