Distilling foundation models for robust and efficient models in digital pathology
- URL: http://arxiv.org/abs/2501.16239v2
- Date: Tue, 28 Jan 2025 17:09:41 GMT
- Title: Distilling foundation models for robust and efficient models in digital pathology
- Authors: Alexandre Filiot, Nicolas Dop, Oussama Tchita, Auriane Riou, Rémy Dubois, Thomas Peeters, Daria Valter, Marin Scalbert, Charlie Saillard, Geneviève Robin, Antoine Olivier,
- Abstract summary: We distilled a large foundation model into a smaller one, reducing the number of parameters by several orders of magnitude.
Our model, H0-mini, achieves nearly comparable performance to large FMs at a significantly reduced inference cost.
It is evaluated on several public benchmarks, achieving 3rd place on the HEST benchmark and 5th place on the EVA benchmark.
- Score: 32.99044401004595
- License:
- Abstract: In recent years, the advent of foundation models (FM) for digital pathology has relied heavily on scaling the pre-training datasets and the model size, yielding large and powerful models. While it resulted in improving the performance on diverse downstream tasks, it also introduced increased computational cost and inference time. In this work, we explore the distillation of a large foundation model into a smaller one, reducing the number of parameters by several orders of magnitude. Leveraging distillation techniques, our distilled model, H0-mini, achieves nearly comparable performance to large FMs at a significantly reduced inference cost. It is evaluated on several public benchmarks, achieving 3rd place on the HEST benchmark and 5th place on the EVA benchmark. Additionally, a robustness analysis conducted on the PLISM dataset demonstrates that our distilled model reaches excellent robustness to variations in staining and scanning conditions, significantly outperforming other state-of-the art models. This opens new perspectives to design lightweight and robust models for digital pathology, without compromising on performance.
Related papers
- Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think [53.2706196341054]
We show that the perceived inefficiency was caused by a flaw in the inference pipeline that has so far gone unnoticed.
We perform end-to-end fine-tuning on top of the single-step model with task-specific losses and get a deterministic model that outperforms all other diffusion-based depth and normal estimation models.
arXiv Detail & Related papers (2024-09-17T16:58:52Z) - Using Generative Models to Produce Realistic Populations of the United Kingdom Windstorms [0.0]
dissertation explores the application of generative models to produce realistic synthetic wind field data.
Three models, including standard GANs, WGAN-GP, and U-net diffusion models, were employed to generate wind maps of the UK.
The results reveal that while all models are effective in capturing the general spatial characteristics, each model exhibits distinct strengths and weaknesses.
arXiv Detail & Related papers (2024-09-16T19:53:33Z) - Structured Model Pruning for Efficient Inference in Computational Pathology [2.9687381456164004]
We develop a methodology for pruning the widely used U-Net-style architectures in biomedical imaging.
We empirically demonstrate that pruning can compress models by at least 70% with a negligible drop in performance.
arXiv Detail & Related papers (2024-04-12T22:05:01Z) - On the Impact of Sampling on Deep Sequential State Estimation [17.92198582435315]
State inference and parameter learning in sequential models can be successfully performed with approximation techniques.
Tighter Monte Carlo objectives have been proposed in the literature to enhance generative modeling performance.
arXiv Detail & Related papers (2023-11-28T17:59:49Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Foundation Models for Generalist Geospatial Artificial Intelligence [3.7002058945990415]
This paper introduces a first-of-a-kind framework for the efficient pre-training and fine-tuning of foundational models on extensive data.
We have utilized this framework to create Prithvi, a transformer-based foundational model pre-trained on more than 1TB of multispectral satellite imagery.
arXiv Detail & Related papers (2023-10-28T10:19:55Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - A Comparative Study on Energy Consumption Models for Drones [4.660172505713055]
We benchmark the five most popular energy consumption models for drones derived from their physical behaviours.
We propose a novel data-driven energy model using the Long Short-Term Memory (LSTM) based deep learning architecture.
Our experimental results have shown that the LSTM based approach can easily outperform other mathematical models for the dataset under study.
arXiv Detail & Related papers (2022-05-30T23:05:32Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - Back2Future: Leveraging Backfill Dynamics for Improving Real-time
Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task.
'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature.
We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.