A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs
- URL: http://arxiv.org/abs/2507.01881v2
- Date: Tue, 15 Jul 2025 18:03:02 GMT
- Title: A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs
- Authors: Niccolò McConnell, Pardeep Vasudev, Daisuke Yamada, Daryl Cheng, Mehran Azimbagirad, John McCabe, Shahab Aslani, Ahmed H. Shahin, Yukun Zhou, The SUMMIT Consortium, Andre Altmann, Yipeng Hu, Paul Taylor, Sam M. Janes, Daniel C. Alexander, Joseph Jacob,
- Abstract summary: Low-dose computed tomography (LDCT) imaging employed in lung cancer screening programs is increasing in uptake worldwide.<n>Yet these efforts are hampered by a shortage of radiologists to interpret scans at scale.<n>Here, we present TANGERINE, a computationally frugal, open-source vision foundation model for LDCT analysis.
- Score: 4.1891161098930105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-dose computed tomography (LDCT) imaging employed in lung cancer screening (LCS) programs is increasing in uptake worldwide. LCS programs herald a generational opportunity to simultaneously detect cancer and non-cancer-related early-stage lung disease. Yet these efforts are hampered by a shortage of radiologists to interpret scans at scale. Here, we present TANGERINE, a computationally frugal, open-source vision foundation model for volumetric LDCT analysis. Designed for broad accessibility and rapid adaptation, TANGERINE can be fine-tuned off the shelf for a wide range of disease-specific tasks with limited computational resources and training data. Relative to models trained from scratch, TANGERINE demonstrates fast convergence during fine-tuning, thereby requiring significantly fewer GPU hours, and displays strong label efficiency, achieving comparable or superior performance with a fraction of fine-tuning data. Pretrained using self-supervised learning on over 98,000 thoracic LDCTs, including the UK's largest LCS initiative to date and 27 public datasets, TANGERINE achieves state-of-the-art performance across 14 disease classification tasks, including lung cancer and multiple respiratory diseases, while generalising robustly across diverse clinical centres. By extending a masked autoencoder framework to 3D imaging, TANGERINE offers a scalable solution for LDCT analysis, departing from recent closed, resource-intensive models by combining architectural simplicity, public availability, and modest computational requirements. Its accessible, open-source lightweight design lays the foundation for rapid integration into next-generation medical imaging tools that could transform LCS initiatives, allowing them to pivot from a singular focus on lung cancer detection to comprehensive respiratory disease management in high-risk populations.
Related papers
- LungCRCT: Causal Representation based Lung CT Processing for Lung Cancer Treatment [3.765413696274397]
LungCRCT is a latent causal representation learning based lung cancer analysis framework.<n>It retrieves causal representations of factors within the physical causal mechanism of lung cancer progression.
arXiv Detail & Related papers (2026-01-26T04:03:50Z) - X-ray Insights Unleashed: Pioneering the Enhancement of Multi-Label Long-Tail Data [86.52299247918637]
Long-tailed pulmonary anomalies in chest radiography present formidable diagnostic challenges.<n>Despite the recent strides in diffusion-based methods for enhancing the representation of tailed lesions, the paucity of rare lesion exemplars curtails the generative capabilities of these approaches.<n>We propose a novel data synthesis pipeline designed to augment tail lesions utilizing a copious supply of conventional normal X-rays.
arXiv Detail & Related papers (2025-12-24T06:14:55Z) - LungEvaty: A Scalable, Open-Source Transformer-based Deep Learning Model for Lung Cancer Risk Prediction in LDCT Screening [37.29507297342265]
LungEvaty is a transformer-based framework for predicting 1-6 year lung cancer risk from a single LDCT scan.<n>It learns directly from large-scale screening data to capture comprehensive anatomical and pathological cues relevant for malignancy risk.<n>LungEvaty was trained on more than 90,000 CT scans, including over 28,000 for fine-tuning and 6,000 for evaluation.
arXiv Detail & Related papers (2025-11-25T09:38:10Z) - Distributed U-net model and Image Segmentation for Lung Cancer Detection [0.0]
This study explores the potential of computer-aided design (CAD) systems, especially utilizing advanced deep learning models such as U-Net.<n>An extensive dataset consisting of lung CT images and corresponding segmentation masks serves as the basis for empirical validation.<n> Empirical results clearly affirm the robust performance of the U-Net model.
arXiv Detail & Related papers (2025-02-20T03:29:23Z) - Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports [68.39938936308023]
We propose a novel text-guided learning method to achieve highly accurate cancer detection results.
Our approach can leverage clinical knowledge by large-scale pre-trained VLM to enhance generalization ability.
arXiv Detail & Related papers (2024-05-23T07:03:38Z) - Double Integral Enhanced Zeroing Neural Network Optimized with ALSOA
fostered Lung Cancer Classification using CT Images [1.1510009152620668]
Lung cancer is one of the deadliest diseases and the leading cause of illness and death.
The proposed method attains 18.32%, 27.20%, and 34.32% higher accuracy analyzed with existing method.
arXiv Detail & Related papers (2023-12-05T10:53:35Z) - Revisiting Computer-Aided Tuberculosis Diagnosis [56.80999479735375]
Tuberculosis (TB) is a major global health threat, causing millions of deaths annually.
Computer-aided tuberculosis diagnosis (CTD) using deep learning has shown promise, but progress is hindered by limited training data.
We establish a large-scale dataset, namely the Tuberculosis X-ray (TBX11K) dataset, which contains 11,200 chest X-ray (CXR) images with corresponding bounding box annotations for TB areas.
This dataset enables the training of sophisticated detectors for high-quality CTD.
arXiv Detail & Related papers (2023-07-06T08:27:48Z) - High-Fidelity Image Synthesis from Pulmonary Nodule Lesion Maps using
Semantic Diffusion Model [10.412300404240751]
Lung cancer has been one of the leading causes of cancer-related deaths worldwide for years.
Deep learning, computer-assisted diagnosis (CAD) models based on learning algorithms can accelerate the screening process.
However, developing robust and accurate models often requires large-scale and diverse medical datasets with high-quality annotations.
arXiv Detail & Related papers (2023-05-02T01:04:22Z) - Enhancing Cancer Prediction in Challenging Screen-Detected Incident Lung
Nodules Using Time-Series Deep Learning [2.744770849264355]
Lung cancer screening (LCS) using annual low-dose computed tomography (CT) scanning has been proven to significantly reduce lung cancer mortality.
Improving risk stratification of malignancy risk in lung nodules can be enhanced using machine/deep learning algorithms.
Here we show the performance of our time-series deep learning model (DeepCAD-NLM-L) which integrates multi-model information across three longitudinal data domains.
arXiv Detail & Related papers (2022-03-30T18:40:36Z) - Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based
Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E)
We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z) - COVID-Net US: A Tailored, Highly Efficient, Self-Attention Deep
Convolutional Neural Network Design for Detection of COVID-19 Patient Cases
from Point-of-care Ultrasound Imaging [101.27276001592101]
We introduce COVID-Net US, a highly efficient, self-attention deep convolutional neural network design tailored for COVID-19 screening from lung POCUS images.
Experimental results show that the proposed COVID-Net US can achieve an AUC of over 0.98 while achieving 353X lower architectural complexity, 62X lower computational complexity, and 14.3X faster inference times on a Raspberry Pi.
To advocate affordable healthcare and artificial intelligence for resource-constrained environments, we have made COVID-Net US open source and publicly available as part of the COVID-Net open source initiative.
arXiv Detail & Related papers (2021-08-05T16:47:33Z) - In-Line Image Transformations for Imbalanced, Multiclass Computer Vision
Classification of Lung Chest X-Rays [91.3755431537592]
This study aims to leverage a body of literature in order to apply image transformations that would serve to balance the lack of COVID-19 LCXR data.
Deep learning techniques such as convolutional neural networks (CNNs) are able to select features that distinguish between healthy and disease states.
This study utilizes a simple CNN architecture for high-performance multiclass LCXR classification at 94 percent accuracy.
arXiv Detail & Related papers (2021-04-06T02:01:43Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - 3D Neural Network for Lung Cancer Risk Prediction on CT Volumes [0.6810862244331126]
Lung cancer is the most common cause of cancer death in the United States.
Lung cancer CT screening has been shown to reduce mortality by up to 40% and is now included in US screening guidelines.
Despite the use of standards for radiological diagnosis, persistent inter-grader variability and incomplete characterization of comprehensive imaging findings remain as limitations of current methods.
In this report, we reproduce a state-of-the-art deep learning algorithm for lung cancer risk prediction.
arXiv Detail & Related papers (2020-07-25T10:01:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.