Understanding Breast Cancer Survival: Using Causality and Language
Models on Multi-omics Data
- URL: http://arxiv.org/abs/2305.18410v1
- Date: Sun, 28 May 2023 17:07:46 GMT
- Title: Understanding Breast Cancer Survival: Using Causality and Language
Models on Multi-omics Data
- Authors: Mugariya Farooq, Shahad Hardan, Aigerim Zhumbhayeva, Yujia Zheng,
Preslav Nakov, Kun Zhang
- Abstract summary: We exploit causal discovery algorithms to investigate how perturbations in the genome can affect the survival of patients diagnosed with breast cancer.
Our findings reveal important factors related to the vital status of patients using causal discovery algorithms.
Results are validated through language models trained on biomedical literature.
- Score: 23.850817918011863
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The need for more usable and explainable machine learning models in
healthcare increases the importance of developing and utilizing causal
discovery algorithms, which aim to discover causal relations by analyzing
observational data. Explainable approaches aid clinicians and biologists in
predicting the prognosis of diseases and suggesting proper treatments. However,
very little research has been conducted at the crossroads between causal
discovery, genomics, and breast cancer, and we aim to bridge this gap.
Moreover, evaluation of causal discovery methods on real data is in general
notoriously difficult because ground-truth causal relations are usually
unknown, and accordingly, in this paper, we also propose to address the
evaluation problem with large language models. In particular, we exploit
suitable causal discovery algorithms to investigate how various perturbations
in the genome can affect the survival of patients diagnosed with breast cancer.
We used three main causal discovery algorithms: PC, Greedy Equivalence Search
(GES), and a Generalized Precision Matrix-based one. We experiment with a
subset of The Cancer Genome Atlas, which contains information about mutations,
copy number variations, protein levels, and gene expressions for 705 breast
cancer patients. Our findings reveal important factors related to the vital
status of patients using causal discovery algorithms. However, the reliability
of these results remains a concern in the medical domain. Accordingly, as
another contribution of the work, the results are validated through language
models trained on biomedical literature, such as BlueBERT and other large
language models trained on medical corpora. Our results profess proper
utilization of causal discovery algorithms and language models for revealing
reliable causal relations for clinical applications.
Related papers
- Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones, and find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - Applying Large Language Models for Causal Structure Learning in Non
Small Cell Lung Cancer [8.248361703850774]
Causal discovery is becoming a key part in medical AI research.
In this paper, we investigate applying Large Language Models to the problem of determining the directionality of edges in causal discovery.
Our result shows that LLMs can accurately predict the directionality of edges in causal graphs, outperforming existing state-of-the-art methods.
arXiv Detail & Related papers (2023-11-13T09:31:14Z) - A Causal Framework for Decomposing Spurious Variations [68.12191782657437]
We develop tools for decomposing spurious variations in Markovian and Semi-Markovian models.
We prove the first results that allow a non-parametric decomposition of spurious effects.
The described approach has several applications, ranging from explainable and fair AI to questions in epidemiology and medicine.
arXiv Detail & Related papers (2023-06-08T09:40:28Z) - Deep Reinforcement Learning Framework for Thoracic Diseases
Classification via Prior Knowledge Guidance [49.87607548975686]
The scarcity of labeled data for related diseases poses a huge challenge to an accurate diagnosis.
We propose a novel deep reinforcement learning framework, which introduces prior knowledge to direct the learning of diagnostic agents.
Our approach's performance was demonstrated using the well-known NIHX-ray 14 and CheXpert datasets.
arXiv Detail & Related papers (2023-06-02T01:46:31Z) - Machine Learning Approach for Cancer Entities Association and
Classification [0.0]
The study uses the two most non-trivial NLP, Natural Language Processing functions, Entity Recognition, and text classification to discover knowledge from biomedical literature.
Named Entity Recognition (NER) recognizes and extracts the predefined entities related to cancer from unstructured text with the support of a user-friendly interface and built-in dictionaries.
Text classification helps to explore the insights into the text and simplifies data categorization, querying, and article screening.
arXiv Detail & Related papers (2023-05-30T07:36:12Z) - Risk Assessment of Lymph Node Metastases in Endometrial Cancer Patients:
A Causal Approach [1.8933952173153485]
We introduce a causal discovery algorithm for causal Bayesian networks based on bootstrap resampling.
We discuss the strengths and limitations of our findings in light of the presence of missing data that may be missing-not-at-random.
arXiv Detail & Related papers (2023-05-17T08:33:32Z) - Learning interpretable causal networks from very large datasets,
application to 400,000 medical records of breast cancer patients [1.2647816797166165]
We report a more reliable and scalable causal discovery method (iMIIC) based on a general mutual information supremum principle.
We showcase iMIIC on synthetic and real-life healthcare data from 396,179 breast cancer patients from the US Surveillance, Epidemiology, and End Results program.
arXiv Detail & Related papers (2023-03-11T15:18:19Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - JigSaw: A tool for discovering explanatory high-order interactions from
random forests [0.0]
JigSaw was developed to aid in the discovery of patterns that could explain predictions made by the forest.
It was first used to identify patterns clinical measurements associated with heart disease.
It was then used to find patterns associated with breast cancer using metabolites measured in the blood.
arXiv Detail & Related papers (2020-05-09T01:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.