Related papers: Tissue Concepts v2: A Supervised Foundation Model For Whole Slide Images

Tissue Concepts v2: A Supervised Foundation Model For Whole Slide Images

URL: http://arxiv.org/abs/2507.05742v2
Date: Wed, 09 Jul 2025 13:30:33 GMT
Title: Tissue Concepts v2: A Supervised Foundation Model For Whole Slide Images
Authors: Till Nicke, Daniela Schacherer, Jan Raphael Schäfer, Natalia Artysh, Antje Prasse, André Homeyer, Andrea Schenk, Henning Höfener, Johannes Lotz,
Abstract summary: We introduce the extension of our supervised foundation model, Tissue Concepts, to whole slide images, called Tissue Concepts v2 (TCv2)<n>TCv2 uses supervised, end-to-end multitask learning on slide-level labels.<n>The presented model shows superior performance compared to SSL-trained models in cancer subtyping benchmarks and is fully trained on freely available data.
Score: 1.1552659783540218
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Foundation models (FMs) are transforming the field of computational pathology by offering new approaches to analyzing histopathology images. Typically relying on weeks of training on large databases, the creation of FMs is a resource-intensive process in many ways. In this paper, we introduce the extension of our supervised foundation model, Tissue Concepts, to whole slide images, called Tissue Concepts v2 (TCv2), a supervised foundation model for whole slide images to address the issue above. TCv2 uses supervised, end-to-end multitask learning on slide-level labels. Training TCv2 uses a fraction of the training resources compared to self-supervised training. The presented model shows superior performance compared to SSL-trained models in cancer subtyping benchmarks and is fully trained on freely available data. Furthermore, a shared trained attention module provides an additional layer of explainability across different tasks.

Related papers

UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing [59.590505989071175]
Text-to-Image (T2I) diffusion models have shown impressive results in generating visually compelling images following user prompts.<n>We introduce UniVG, a generalist diffusion model capable of supporting a diverse range of image generation tasks with a single set of weights.
arXiv Detail & Related papers (2025-03-16T21:11:25Z)
Few-Shot Medical Image Segmentation with High-Fidelity Prototypes [38.073371773707514]
We propose a novel Detail Self-refined Prototype Network (DSPNet) to construct high-fidelity prototypes representing the object foreground and the background more comprehensively. To construct global semantics while maintaining the captured detail semantics, we learn the foreground prototypes by modelling the multi-modal structures with clustering and then fusing each in a channel-wise manner.
arXiv Detail & Related papers (2024-06-26T05:06:14Z)
Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. We identify model weaknesses by testing the model using the counterfactual image dataset. We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z)
PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology [9.556246087301883]
We present a slide-level foundation model for H&E-stained histopathology, PRISM, that builds on Virchow tile embeddings. PRISM produces slide-level embeddings with the ability to generate clinical reports, resulting in several modes of use. Using text prompts, PRISM achieves zero-shot cancer detection and sub-typing performance approaching that of a supervised aggregator model.
arXiv Detail & Related papers (2024-05-16T16:59:12Z)
Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples. For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge. We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z)
FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets. We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation. Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z)
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z)
Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks. We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception. Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z)
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task [47.1857510710807]
We present a new learning framework, dubbed GPT4Image, where the knowledge of the large pre-trained models are extracted to help CNNs and ViTs learn better representations.<n>We conduct extensive experiments to verify the effectiveness of the proposed algorithm on various visual perception tasks.
arXiv Detail & Related papers (2023-06-01T14:02:45Z)
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring [82.84513669453744]
Image-text pretrained models, e.g., CLIP, have shown impressive general multi-modal knowledge learned from large-scale image-text data pairs. We revisit temporal modeling in the context of image-to-video knowledge transferring. We present a simple and effective temporal modeling mechanism extending CLIP model to diverse video tasks.
arXiv Detail & Related papers (2023-01-26T14:12:02Z)
Multi-task pre-training of deep neural networks for digital pathology [8.74883469030132]
We first assemble and transform many digital pathology datasets into a pool of 22 classification tasks and almost 900k images. We show that our models used as feature extractors either improve significantly over ImageNet pre-trained models or provide comparable performance.
arXiv Detail & Related papers (2020-05-05T08:50:17Z)
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data [9.3935916515127]
We introduce a new vision-supervised pre-trained model -- ImageBERT -- for image-text joint embedding. Our model is a Transformer-based model, which takes different modalities as input and models the relationship between them.
arXiv Detail & Related papers (2020-01-22T11:35:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.