Related papers: Visual question answering-based image-finding generation for pulmonary nodules on chest CT from structured annotations

Visual question answering-based image-finding generation for pulmonary nodules on chest CT from structured annotations

URL: http://arxiv.org/abs/2601.11075v1
Date: Fri, 16 Jan 2026 08:21:26 GMT
Title: Visual question answering-based image-finding generation for pulmonary nodules on chest CT from structured annotations
Authors: Maiko Nagao, Kaito Urata, Atsushi Teramoto, Kazuyoshi Imaizumi, Masashi Kondo, Hiroshi Fujita,
Abstract summary: Interpretation of imaging findings based on morphological characteristics is important for diagnosing pulmonary nodules on chest computed tomography (CT) images.<n>In this study, we investigated an image-finding generation method for chest CT images based on visual questions answering (VQA)<n>The proposed method was effective as an interactive diagnostic support system that can present image findings according to physicians' interests.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Interpretation of imaging findings based on morphological characteristics is important for diagnosing pulmonary nodules on chest computed tomography (CT) images. In this study, we constructed a visual question answering (VQA) dataset from structured data in an open dataset and investigated an image-finding generation method for chest CT images, with the aim of enabling interactive diagnostic support that presents findings based on questions that reflect physicians' interests rather than fixed descriptions. In this study, chest CT images included in the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) datasets were used. Regions of interest surrounding the pulmonary nodules were extracted from these images, and image findings and questions were defined based on morphological characteristics recorded in the database. A dataset comprising pairs of cropped images, corresponding questions, and image findings was constructed, and the VQA model was fine-tuned on it. Language evaluation metrics such as BLEU were used to evaluate the generated image findings. The VQA dataset constructed using the proposed method contained image findings with natural expressions as radiological descriptions. In addition, the generated image findings showed a high CIDEr score of 3.896, and a high agreement with the reference findings was obtained through evaluation based on morphological characteristics. We constructed a VQA dataset for chest CT images using structured information on the morphological characteristics from the LIDC-IDRI dataset. Methods for generating image findings in response to these questions have also been investigated. Based on the generated results and evaluation metric scores, the proposed method was effective as an interactive diagnostic support system that can present image findings according to physicians' interests.

Related papers

Generation of Chest CT pulmonary Nodule Images by Latent Diffusion Models using the LIDC-IDRI Dataset [0.0]
In clinical practice, it is difficult to collect the large amount of CT images for specific cases.<n>We proposed a method to automatically generate chest CT nodule images based on input text using latent diffusion models (LDM)<n> Evaluation results demonstrated that the proposed method could generate high-quality images that successfully capture specific medical features.
arXiv Detail & Related papers (2026-01-16T08:36:12Z)
RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining [64.66825253356869]
We propose a novel methodology that leverages dense radiology reports to define image-wise similarity ordering at multiple granularities.<n>We construct two comprehensive medical imaging retrieval datasets: MIMIC-IR for Chest X-rays and CTRATE-IR for CT scans.<n>We develop two retrieval systems, RadIR-CXR and model-ChestCT, which demonstrate superior performance in traditional image-image and image-report retrieval tasks.
arXiv Detail & Related papers (2025-03-06T17:43:03Z)
MedIAnomaly: A comparative study of anomaly detection in medical images [26.319602363581442]
Anomaly detection (AD) aims at detecting abnormal samples that deviate from the expected normal patterns.<n>Despite the emergence of numerous methods for medical AD, the lack of a fair and comprehensive evaluation causes ambiguous conclusions.<n>This paper builds a benchmark with unified comparison to address this problem.
arXiv Detail & Related papers (2024-04-06T06:18:11Z)
VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics [0.0]
Visual attribution in medical imaging seeks to make evident the diagnostically-relevant components of a medical image. We here present a novel generative visual attribution technique, one that leverages latent diffusion models in combination with domain-specific large language models. The resulting system also exhibits a range of latent capabilities including zero-shot localized disease induction.
arXiv Detail & Related papers (2024-01-02T19:51:49Z)
Visual Grounding of Whole Radiology Reports for 3D CT Images [12.071135670684013]
We present the first visual grounding framework designed for CT image and report pairs covering various body parts and diverse anomaly types. Our framework combines two components of 1) anatomical segmentation of images, and 2) report structuring. We constructed a large-scale dataset with region-description correspondence annotations for 10,410 studies of 7,321 unique patients.
arXiv Detail & Related papers (2023-12-08T02:09:17Z)
Radiology Report Generation Using Transformers Conditioned with Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information. The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z)
Region-based Contrastive Pretraining for Medical Image Retrieval with Anatomic Query [56.54255735943497]
Region-based contrastive pretraining for Medical Image Retrieval (RegionMIR) We introduce a novel Region-based contrastive pretraining for Medical Image Retrieval (RegionMIR)
arXiv Detail & Related papers (2023-05-09T16:46:33Z)
Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records. The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z)
Preservation of High Frequency Content for Deep Learning-Based Medical Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists. We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z)
SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection [76.01333073259677]
We propose the use of Space-aware Memory Queues for In-painting and Detecting anomalies from radiography images (abbreviated as SQUID) We show that SQUID can taxonomize the ingrained anatomical structures into recurrent patterns; and in the inference, it can identify anomalies (unseen/modified patterns) in the image.
arXiv Detail & Related papers (2021-11-26T13:47:34Z)
RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR [14.586822005217485]
We present RadFusion, a benchmark dataset of 1794 patients with corresponding EHR data and CT scans labeled for pulmonary embolism. Our results suggest that integrating imaging and EHR data can improve classification performance without introducing large disparities in the true positive rate between population groups.
arXiv Detail & Related papers (2021-11-23T06:10:07Z)
Fine-tuning ERNIE for chest abnormal imaging signs extraction [0.6091702876917281]
We formulate chest abnormal imaging sign extraction as a sequence tagging and matching problem. We propose a transferred abnormal imaging signs extractor with pretrained ERNIE as the backbone. We design a simple but effective tag2relation algorithm based on the nature of chest imaging report text.
arXiv Detail & Related papers (2020-10-25T05:18:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.