Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning
- URL: http://arxiv.org/abs/2405.01469v1
- Date: Thu, 2 May 2024 16:59:10 GMT
- Title: Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning
- Authors: Théo Moutakanni, Piotr Bojanowski, Guillaume Chassagnon, Céline Hudelot, Armand Joulin, Yann LeCun, Matthew Muckley, Maxime Oquab, Marie-Pierre Revel, Maria Vakalopoulou,
- Abstract summary: We present RayDINO, a large visual encoder trained by self-supervision on 873k chest X-rays.
We compare RayDINO to previous state-of-the-art models across nine radiology tasks, from classification and dense segmentation to text generation.
Our findings suggest that self-supervision allows patient-centric AI proving useful in clinical and interpreting X-rays holistically.
- Score: 33.9544297423474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI Foundation models are gaining traction in various applications, including medical fields like radiology. However, medical foundation models are often tested on limited tasks, leaving their generalisability and biases unexplored. We present RayDINO, a large visual encoder trained by self-supervision on 873k chest X-rays. We compare RayDINO to previous state-of-the-art models across nine radiology tasks, from classification and dense segmentation to text generation, and provide an in depth analysis of population, age and sex biases of our model. Our findings suggest that self-supervision allows patient-centric AI proving useful in clinical workflows and interpreting X-rays holistically. With RayDINO and small task-specific adapters, we reach state-of-the-art results and improve generalization to unseen populations while mitigating bias, illustrating the true promise of foundation models: versatility and robustness.
Related papers
- CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting [0.0]
We evaluate the publicly available, state of the art, foundational vision-language models for chest X-ray interpretation.
We find that vision-language models often hallucinate with confident language, which slows down clinical interpretation.
We develop an agent-based vision-language approach for report generation using CheXagent's linear probes and BioViL-T's phrase grounding tools.
arXiv Detail & Related papers (2024-07-11T18:39:19Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - The Limits of Fair Medical Imaging AI In The Wild [43.97266228706059]
We investigate the extent to which medical AI utilizes demographic encodings.
We confirm that medical imaging AI leverages demographic shortcuts in disease classification.
We find that models with less encoding of demographic attributes are often most "globally optimal"
arXiv Detail & Related papers (2023-12-11T18:59:50Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Vision-Language Generative Model for View-Specific Chest X-ray Generation [18.347723213970696]
ViewXGen is designed to overcome the limitations of existing methods to generate frontal-view chest X-rays.
Our approach takes into consideration the diverse view positions found in the dataset, enabling the generation of chest X-rays with specific views.
arXiv Detail & Related papers (2023-02-23T17:13:25Z) - Self-supervised Multi-modal Training from Uncurated Image and Reports
Enables Zero-shot Oversight Artificial Intelligence in Radiology [31.045221580446963]
We present a model dubbed Medical Cross-attention Vision-Language model (Medical X-VL)
Our model enables various zero-shot tasks for oversight AI, ranging from the zero-shot classification to zero-shot error correction.
Our method was especially successful in the data-limited setting, suggesting the potential widespread applicability in medical domain.
arXiv Detail & Related papers (2022-08-10T04:35:58Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Pristine annotations-based multi-modal trained artificial intelligence
solution to triage chest X-ray for COVID-19 [1.1764495014312295]
COVID-19 pandemic continues to spread and impact the well-being of the global population.
Front-line modalities including computed tomography (CT) and X-ray play an important role for triaging COVID patients.
Considering the limited access of resources (both hardware and trained personnel) and decontamination considerations, CT may not be ideal for triaging suspected subjects.
arXiv Detail & Related papers (2020-11-10T15:36:08Z) - Learning Invariant Feature Representation to Improve Generalization
across Chest X-ray Datasets [55.06983249986729]
We show that a deep learning model performing well when tested on the same dataset as training data starts to perform poorly when it is tested on a dataset from a different source.
By employing an adversarial training strategy, we show that a network can be forced to learn a source-invariant representation.
arXiv Detail & Related papers (2020-08-04T07:41:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.