Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology
- URL: http://arxiv.org/abs/2409.09369v4
- Date: Tue, 11 Feb 2025 14:11:14 GMT
- Title: Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology
- Authors: Pei Liu, Luping Ji, Jiaxiang Gou, Bo Fu, Mao Ye,
- Abstract summary: Histo Whole-Slide Images (WSIs) provide an important tool to assess cancer prognosis in computational pathology (CPATH)<n>Existing survival analysis approaches have made exciting progress, but they are generally limited to adopting highly-expressive network architectures.<n>This paper proposes a new Vision-Language-based SA (VLSA) paradigm to overcome performance bottlenecks.
- Score: 15.83613460419667
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Histopathology Whole-Slide Images (WSIs) provide an important tool to assess cancer prognosis in computational pathology (CPATH). While existing survival analysis (SA) approaches have made exciting progress, they are generally limited to adopting highly-expressive network architectures and only coarse-grained patient-level labels to learn visual prognostic representations from gigapixel WSIs. Such learning paradigm suffers from critical performance bottlenecks, when facing present scarce training data and standard multi-instance learning (MIL) framework in CPATH. To overcome it, this paper, for the first time, proposes a new Vision-Language-based SA (VLSA) paradigm. Concretely, (1) VLSA is driven by pathology VL foundation models. It no longer relies on high-capability networks and shows the advantage of data efficiency. (2) In vision-end, VLSA encodes textual prognostic prior and then employs it as auxiliary signals to guide the aggregating of visual prognostic features at instance level, thereby compensating for the weak supervision in MIL. Moreover, given the characteristics of SA, we propose i) ordinal survival prompt learning to transform continuous survival labels into textual prompts; and ii) ordinal incidence function as prediction target to make SA compatible with VL-based prediction. Notably, VLSA's predictions can be interpreted intuitively by our Shapley values-based method. The extensive experiments on five datasets confirm the effectiveness of our scheme. Our VLSA could pave a new way for SA in CPATH by offering weakly-supervised MIL an effective means to learn valuable prognostic clues from gigapixel WSIs. Our source code is available at https://github.com/liupei101/VLSA.
Related papers
- HiLa: Hierarchical Vision-Language Collaboration for Cancer Survival Prediction [55.00788339683146]
We propose a novel Hierarchical vision-Language collaboration framework for improved survival prediction.<n> Specifically, HiLa employs pretrained feature extractors to generate hierarchical visual features from WSIs at both patch and region levels.<n>This ap-proach enables the comprehensive learning of discriminative visual features cor-responding to different survival-related attributes from prompts.
arXiv Detail & Related papers (2025-07-07T02:06:25Z) - Lifelong Whole Slide Image Analysis: Online Vision-Language Adaptation and Past-to-Present Gradient Distillation [1.1497371646067622]
Whole Slide Images (WSIs) play a crucial role in accurate cancer diagnosis and prognosis.<n>Given that WSIs are gigapixels in size, they present difficulties in terms of storage, processing, and model training.<n>We introduce ADaFGrad, a method designed to enhance lifelong learning for whole-slide image (WSI) analysis.
arXiv Detail & Related papers (2025-05-04T04:46:08Z) - Vision Transformers with Autoencoders and Explainable AI for Cancer Patient Risk Stratification Using Whole Slide Imaging [3.6940298700319065]
PATH-X is a framework that integrates Vision Transformers (ViT) and Autoencoders with SHAP (Shapley Additive Explanations) to enhance modelability for patient stratification and risk prediction.
A representative image slice is selected from each WSI, and numerical feature embeddings are extracted using Google's pre-trained ViT.
Kaplan-Meier survival analysis is applied to evaluate stratification into two and three risk groups.
arXiv Detail & Related papers (2025-04-07T05:48:42Z) - Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models [50.587868616659826]
Sparse Autoencoders (SAEs) have been shown to enhance interpretability and steerability in Large Language Models (LLMs)
In this work, we extend the application of SAEs to Vision-Language Models (VLMs), such as CLIP, and introduce a comprehensive framework for evaluating monosemanticity in vision representations.
arXiv Detail & Related papers (2025-04-03T17:58:35Z) - VLEER: Vision and Language Embeddings for Explainable Whole Slide Image Representation [3.695317701129061]
We introduce Vision and Language Embeddings for Explainable WSI Representation (VLEER), a novel method designed to leverage vision features for WSI representation.
VLEER offers the unique advantage of interpretability, enabling direct human-readable insights into the results.
arXiv Detail & Related papers (2025-02-28T08:49:03Z) - VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models [66.56298924208319]
Vision-language generative reward models (VL-GenRMs) play a crucial role in aligning and evaluating multimodal AI systems.
Current assessment methods rely on AI-annotated preference labels from traditional tasks.
We introduce VL-RewardBench, a benchmark spanning general multimodal queries, visual hallucination detection, and complex reasoning tasks.
arXiv Detail & Related papers (2024-11-26T14:08:34Z) - Vision Language Models are In-Context Value Learners [89.29486557646624]
We present Generative Value Learning (GVL), a universal value function estimator that leverages the world knowledge embedded in vision-language models (VLMs) to predict task progress.
Without any robot or task specific training, GVL can in-context zero-shot and few-shot predict effective values for more than 300 distinct real-world tasks.
arXiv Detail & Related papers (2024-11-07T09:17:50Z) - MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection [107.15164718585666]
We investigate the root cause of VLMs' biased prediction under the open vocabulary detection context.
Our observations lead to a simple yet effective paradigm, coded MarvelOVD, that generates significantly better training targets.
Our method outperforms the other state-of-the-arts by significant margins.
arXiv Detail & Related papers (2024-07-31T09:23:57Z) - An efficient framework based on large foundation model for cervical cytopathology whole slide image screening [13.744580492120749]
We propose an efficient framework for cervical cytopathology WSI classification using only WSI-level labels through unsupervised and weakly supervised learning.
Experiments conducted on the CSD and FNAC 2019 datasets demonstrate that the proposed method enhances the performance of various MIL methods and achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-07-16T08:21:54Z) - What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases [87.65903426052155]
We perform a large-scale transfer learning experiment aimed at discovering latent vision-language skills from data.
We show that generation tasks suffer from a length bias, suggesting benchmarks should balance tasks with varying output lengths.
We present a new dataset, OLIVE, which simulates user instructions in the wild and presents challenges dissimilar to all datasets we tested.
arXiv Detail & Related papers (2024-04-03T02:40:35Z) - AdvMIL: Adversarial Multiple Instance Learning for the Survival Analysis
on Whole-Slide Images [12.09957276418002]
We propose a novel adversarial multiple instance learning (AdvMIL) framework.
This framework is based on adversarial time-to-event modeling, and integrates the multiple instance learning (MIL) that is much necessary for WSI representation learning.
Our experiments show that AdvMIL not only could bring performance improvement to mainstream WSI survival analysis methods at a relatively low computational cost, but also enables these methods to effectively utilize unlabeled data via semi-supervised learning.
arXiv Detail & Related papers (2022-12-13T12:02:05Z) - Self-Supervised PPG Representation Learning Shows High Inter-Subject
Variability [3.8036939971290007]
We propose a Self-Supervised Learning (SSL) method with a pretext task of signal reconstruction to learn an informative generalized PPG representation.
Results show that in a very limited label data setting (10 samples per class or less), using SSL is beneficial.
SSL may pave the way for the broader use of machine learning models on PPG data in label-scarce regimes.
arXiv Detail & Related papers (2022-12-07T19:02:45Z) - Improving Commonsense in Vision-Language Models via Knowledge Graph
Riddles [83.41551911845157]
This paper focuses on analyzing and improving the commonsense ability of recent popular vision-language (VL) models.
We propose a more scalable strategy, i.e., "Data Augmentation with kNowledge graph linearization for CommonsensE capability" (DANCE)
For better commonsense evaluation, we propose the first retrieval-based commonsense diagnostic benchmark.
arXiv Detail & Related papers (2022-11-29T18:59:59Z) - Investigating Power laws in Deep Representation Learning [4.996066540156903]
We propose a framework to evaluate the quality of representations in unlabelled datasets.
We estimate the coefficient of the power law, $alpha$, across three key attributes which influence representation learning.
Notably, $alpha$ is computable from the representations without knowledge of any labels, thereby offering a framework to evaluate the quality of representations in unlabelled datasets.
arXiv Detail & Related papers (2022-02-11T18:11:32Z) - Generalized Zero-Shot Learning via VAE-Conditioned Generative Flow [83.27681781274406]
Generalized zero-shot learning aims to recognize both seen and unseen classes by transferring knowledge from semantic descriptions to visual representations.
Recent generative methods formulate GZSL as a missing data problem, which mainly adopts GANs or VAEs to generate visual features for unseen classes.
We propose a conditional version of generative flows for GZSL, i.e., VAE-Conditioned Generative Flow (VAE-cFlow)
arXiv Detail & Related papers (2020-09-01T09:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.