Application Of Vision-Language Models For Assessing Osteoarthritis
Disease Severity
- URL: http://arxiv.org/abs/2401.06331v1
- Date: Fri, 12 Jan 2024 02:43:58 GMT
- Title: Application Of Vision-Language Models For Assessing Osteoarthritis
Disease Severity
- Authors: Banafshe Felfeliyan and Yuyue Zhou and Shrimanti Ghosh and Jessica
Kupper and Shaobo Liu and Abhilash Hareendranathan and Jacob L. Jaremko
- Abstract summary: Osteoarthritis (OA) poses a global health challenge, demanding precise diagnostic methods.
Existing deep learning models for OA assessment are unimodal single task systems.
This study investigates employing Vision Language Processing models to predict OA severity using Xray images and corresponding reports.
- Score: 0.43431539537721414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Osteoarthritis (OA) poses a global health challenge, demanding precise
diagnostic methods. Current radiographic assessments are time consuming and
prone to variability, prompting the need for automated solutions. The existing
deep learning models for OA assessment are unimodal single task systems and
they don't incorporate relevant text information such as patient demographics,
disease history, or physician reports. This study investigates employing Vision
Language Processing (VLP) models to predict OA severity using Xray images and
corresponding reports. Our method leverages Xray images of the knee and diverse
report templates generated from tabular OA scoring values to train a CLIP
(Contrastive Language Image PreTraining) style VLP model. Furthermore, we
incorporate additional contrasting captions to enforce the model to
discriminate between positive and negative reports. Results demonstrate the
efficacy of these models in learning text image representations and their
contextual relationships, showcase potential advancement in OA assessment, and
establish a foundation for specialized vision language models in medical
contexts.
Related papers
- CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting [0.0]
We evaluate the publicly available, state of the art, foundational vision-language models for chest X-ray interpretation.
We find that vision-language models often hallucinate with confident language, which slows down clinical interpretation.
We develop an agent-based vision-language approach for report generation using CheXagent's linear probes and BioViL-T's phrase grounding tools.
arXiv Detail & Related papers (2024-07-11T18:39:19Z) - Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis [53.809054774037214]
This paper proposes leveraging vision-language pretraining on bone X-rays paired with French reports.
It is the first study to integrate French reports to shape the embedding space devoted to bone X-Rays representations.
arXiv Detail & Related papers (2024-05-14T19:53:20Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Self-supervised Multi-modal Training from Uncurated Image and Reports
Enables Zero-shot Oversight Artificial Intelligence in Radiology [31.045221580446963]
We present a model dubbed Medical Cross-attention Vision-Language model (Medical X-VL)
Our model enables various zero-shot tasks for oversight AI, ranging from the zero-shot classification to zero-shot error correction.
Our method was especially successful in the data-limited setting, suggesting the potential widespread applicability in medical domain.
arXiv Detail & Related papers (2022-08-10T04:35:58Z) - Variational Topic Inference for Chest X-Ray Report Generation [102.04931207504173]
Report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice.
Recent work has shown that deep learning models can successfully caption natural images.
We propose variational topic inference for automatic report generation.
arXiv Detail & Related papers (2021-07-15T13:34:38Z) - Automated Knee X-ray Report Generation [12.732469371097347]
We propose to take advantage of past radiological exams and formulate a framework capable of learning the correspondence between the images and reports.
We demonstrate how aggregating the image features of individual exams and using them as conditional inputs when training a language generation model results in auto-generated exam reports.
arXiv Detail & Related papers (2021-05-22T11:59:42Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z) - A Comparison of Pre-trained Vision-and-Language Models for Multimodal
Representation Learning across Medical Images and Reports [5.074841553282345]
In this study, we adopt four pre-trained V+L models to learn multimodal representation from MIMIC-CXR radiographs and associated reports.
In comparison to the pioneering CNN-RNN model, the joint embedding learned by pre-trained V+L models demonstrate performance improvement in the thoracic findings classification task.
arXiv Detail & Related papers (2020-09-03T09:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.