Automatic Fine-grained Segmentation-assisted Report Generation
- URL: http://arxiv.org/abs/2507.16623v1
- Date: Tue, 22 Jul 2025 14:16:20 GMT
- Title: Automatic Fine-grained Segmentation-assisted Report Generation
- Authors: Frederic Jonske, Constantin Seibold, Osman Alperen Koras, Fin Bahnsen, Marie Bauer, Amin Dada, Hamza Kalisch, Anton Schily, Jens Kleesiek,
- Abstract summary: We present ASaRG, an extension of the popular LLaVA architecture for report generation.<n>Our approach achieves a +0.89% performance gain in CE F1 score compared to the LLaVA baseline.<n>Our code will be made publicly available at a later date.
- Score: 3.6341072547314037
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reliable end-to-end clinical report generation has been a longstanding goal of medical ML research. The end goal for this process is to alleviate radiologists' workloads and provide second opinions to clinicians or patients. Thus, a necessary prerequisite for report generation models is a strong general performance and some type of innate grounding capability, to convince clinicians or patients of the veracity of the generated reports. In this paper, we present ASaRG (\textbf{A}utomatic \textbf{S}egmentation-\textbf{a}ssisted \textbf{R}eport \textbf{G}eneration), an extension of the popular LLaVA architecture that aims to tackle both of these problems. ASaRG proposes to fuse intermediate features and fine-grained segmentation maps created by specialist radiological models into LLaVA's multi-modal projection layer via simple concatenation. With a small number of added parameters, our approach achieves a +0.89\% performance gain ($p=0.012$) in CE F1 score compared to the LLaVA baseline when using only intermediate features, and +2.77\% performance gain ($p<0.001$) when adding a combination of intermediate features and fine-grained segmentation maps. Compared with COMG and ORID, two other report generation methods that utilize segmentations, the performance gain amounts to 6.98\% and 6.28\% in F1 score, respectively. ASaRG is not mutually exclusive with other changes made to the LLaVA architecture, potentially allowing our method to be combined with other advances in the field. Finally, the use of an arbitrary number of segmentations as part of the input demonstrably allows tracing elements of the report to the corresponding segmentation maps and verifying the groundedness of assessments. Our code will be made publicly available at a later date.
Related papers
- A Benchmark for End-to-End Zero-Shot Biomedical Relation Extraction with LLMs: Experiments with OpenAI Models [7.923208324118286]
We study patterns in the performance of OpenAI LLMs across a diverse sampling of biomedical relation extraction tasks.<n>We found the zero-shot performances to be proximal to that of fine-tuned methods.
arXiv Detail & Related papers (2025-04-05T07:08:54Z) - GAUDA: Generative Adaptive Uncertainty-guided Diffusion-based Augmentation for Surgical Segmentation [1.0808810256442274]
We learn semantically comprehensive yet compact latent representations of the (image, mask) space.<n>We show that our approach can effectively synthesise unseen high-quality paired segmentation data of remarkable semantic coherence.
arXiv Detail & Related papers (2025-01-18T16:40:53Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Prompting Segment Anything Model with Domain-Adaptive Prototype for Generalizable Medical Image Segmentation [49.5901368256326]
We propose a novel Domain-Adaptive Prompt framework for fine-tuning the Segment Anything Model (termed as DAPSAM) in segmenting medical images.
Our DAPSAM achieves state-of-the-art performance on two medical image segmentation tasks with different modalities.
arXiv Detail & Related papers (2024-09-19T07:28:33Z) - Interactive 3D Segmentation for Primary Gross Tumor Volume in Oropharyngeal Cancer [1.9997842016096374]
We implement state-of-the-art algorithms and propose a novel two-stage Interactive Click Refinement framework.
The 2S-ICR framework achieves a Dice similarity coefficient of 0.713 $pm$ 0.152 without user interaction and 0.824 $pm$ 0.099 after five interactions, outperforming existing methods in both cases.
arXiv Detail & Related papers (2024-09-10T15:58:21Z) - ASPS: Augmented Segment Anything Model for Polyp Segmentation [77.25557224490075]
The Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation.
SAM's Transformer-based structure prioritizes global and low-frequency information.
CFA integrates a trainable CNN encoder branch with a frozen ViT encoder, enabling the integration of domain-specific knowledge.
arXiv Detail & Related papers (2024-06-30T14:55:32Z) - PAM-UNet: Shifting Attention on Region of Interest in Medical Images [5.730272874074418]
UNet and its variants face a critical challenge: balancing accuracy with computational efficiency.
We propose a novel underlineProgressive underlineAttention based underlineMobile underlineUNet architecture.
Our approach prioritizes both accuracy and speed, achieving a commendable balance with a mean IoU of 74.65 and a dice score of 82.87.
arXiv Detail & Related papers (2024-05-02T17:33:26Z) - HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction [16.060286162384536]
HistGen is a learning-empowered framework for histopathology report generation.
It aims to boost report generation by aligning whole slide images (WSIs) and diagnostic reports from local and global granularity.
Experimental results on WSI report generation show the proposed model outperforms state-of-the-art (SOTA) models by a large margin.
arXiv Detail & Related papers (2024-03-08T15:51:43Z) - Few-Shot Learning for Annotation-Efficient Nucleus Instance Segmentation [50.407071700154674]
We propose to formulate annotation-efficient nucleus instance segmentation from the perspective of few-shot learning (FSL)
Our work was motivated by that, with the prosperity of computational pathology, an increasing number of fully-annotated datasets are publicly accessible.
Extensive experiments on a couple of publicly accessible datasets demonstrate that SGFSIS can outperform other annotation-efficient learning baselines.
arXiv Detail & Related papers (2024-02-26T03:49:18Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text
Sequence-to-Sequence Modeling [5.601559340796398]
We propose a framework that integrates LoRA and structured layer pruning.
Our framework can reduce 50% of GPU memory usage and speed up 100% of the training phase.
arXiv Detail & Related papers (2023-05-15T00:21:08Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.