Position: Restructuring of Categories and Implementation of Guidelines Essential for VLM Adoption in Healthcare
- URL: http://arxiv.org/abs/2505.08818v1
- Date: Mon, 12 May 2025 18:39:54 GMT
- Title: Position: Restructuring of Categories and Implementation of Guidelines Essential for VLM Adoption in Healthcare
- Authors: Amara Tariq, Rimita Lahiri, Charles Kahn, Imon Banerjee,
- Abstract summary: Vision language model (VLM) development requires clear and standardized reporting protocols.<n>Traditional machine learning reporting standards and evaluation guidelines must be restructured to accommodate multiphase VLM studies.<n>We propose a categorization framework for VLM studies and outline corresponding reporting standards.
- Score: 5.9372801317341155
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The intricate and multifaceted nature of vision language model (VLM) development, adaptation, and application necessitates the establishment of clear and standardized reporting protocols, particularly within the high-stakes context of healthcare. Defining these reporting standards is inherently challenging due to the diverse nature of studies involving VLMs, which vary significantly from the development of all new VLMs or finetuning for domain alignment to off-the-shelf use of VLM for targeted diagnosis and prediction tasks. In this position paper, we argue that traditional machine learning reporting standards and evaluation guidelines must be restructured to accommodate multiphase VLM studies; it also has to be organized for intuitive understanding of developers while maintaining rigorous standards for reproducibility. To facilitate community adoption, we propose a categorization framework for VLM studies and outline corresponding reporting standards that comprehensively address performance evaluation, data reporting protocols, and recommendations for manuscript composition. These guidelines are organized according to the proposed categorization scheme. Lastly, we present a checklist that consolidates reporting standards, offering a standardized tool to ensure consistency and quality in the publication of VLM-related research.
Related papers
- TaMPERing with Large Language Models: A Field Guide for using Generative AI in Public Administration Research [0.0]
The integration of Large Language Models (LLMs) into social science research presents transformative opportunities for advancing scientific inquiry.<n>This manuscript introduces the TaMPER framework-a structured methodology organized around five critical decision points: Task, Model, Prompt, Evaluation, and Reporting.
arXiv Detail & Related papers (2025-03-30T21:38:11Z) - The ELEVATE-AI LLMs Framework: An Evaluation Framework for Use of Large Language Models in HEOR: an ISPOR Working Group Report [12.204470166456561]
This article introduces the ELEVATE AI LLMs framework and checklist.<n>The framework comprises ten evaluation domains, including model characteristics, accuracy, comprehensiveness, and fairness.<n> Validation of the framework and checklist on studies of systematic literature reviews and health economic modeling highlighted their ability to identify strengths and gaps in reporting.
arXiv Detail & Related papers (2024-12-23T14:09:10Z) - CATER: Leveraging LLM to Pioneer a Multidimensional, Reference-Independent Paradigm in Translation Quality Evaluation [0.0]
Comprehensive AI-assisted Translation Edit Ratio (CATER) is a novel framework for evaluating machine translation (MT) quality.<n>Uses large language models (LLMs) via a carefully designed prompt-based protocol.
arXiv Detail & Related papers (2024-12-15T17:45:34Z) - NLP Cluster Analysis of Common Core State Standards and NAEP Item Specifications [0.0]
Camilli (2024) proposed a methodology using natural language processing (NLP) to map the relationship of a set of content standards to item specifications.<n>This study provided evidence that NLP can be used to improve the mapping process.
arXiv Detail & Related papers (2024-11-20T15:44:58Z) - Investigating Privacy Bias in Training Data of Language Models [1.3167450470598043]
A privacy bias refers to the skew in the appropriateness of information flows within a given context.<n>This skew may either align with existing expectations or signal a symptom of systemic issues.<n>We present a novel approach to assess the privacy biases using a contextual integrity-based methodology.
arXiv Detail & Related papers (2024-09-05T17:50:31Z) - TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs [50.259001311894295]
We propose a novel TRansformer-based Attribution framework using Contrastive Embeddings called TRACE.
We show that TRACE significantly improves the ability to attribute sources accurately, making it a valuable tool for enhancing the reliability and trustworthiness of large language models.
arXiv Detail & Related papers (2024-07-06T07:19:30Z) - LLM4Rerank: LLM-based Auto-Reranking Framework for Recommendations [51.76373105981212]
Reranking is a critical component in recommender systems, playing an essential role in refining the output of recommendation algorithms.<n>We introduce a comprehensive reranking framework, designed to seamlessly integrate various reranking criteria.<n>A customizable input mechanism is also integrated, enabling the tuning of the language model's focus to meet specific reranking needs.
arXiv Detail & Related papers (2024-06-18T09:29:18Z) - IMDL-BenCo: A Comprehensive Benchmark and Codebase for Image Manipulation Detection & Localization [58.32394109377374]
IMDL-BenCo is the first comprehensive IMDL benchmark and modular framework.
It decomposes the IMDL framework into standardized, reusable components and revises the model construction pipeline.
It includes 8 state-of-the-art IMDL models (1 of which are reproduced from scratch), 2 sets of standard training and evaluation protocols, 15 GPU-accelerated evaluation metrics, and 3 kinds of robustness evaluation.
arXiv Detail & Related papers (2024-06-15T09:44:54Z) - OLMES: A Standard for Language Model Evaluations [64.85905119836818]
OLMES is a documented, practical, open standard for reproducible language model evaluations.<n>It supports meaningful comparisons between smaller base models that require the unnatural "cloze" formulation of multiple-choice questions.<n> OLMES includes well-considered, documented recommendations guided by results from existing literature as well as new experiments resolving open questions.
arXiv Detail & Related papers (2024-06-12T17:37:09Z) - Learnable Item Tokenization for Generative Recommendation [78.30417863309061]
We propose LETTER (a LEarnable Tokenizer for generaTivE Recommendation), which integrates hierarchical semantics, collaborative signals, and code assignment diversity.
LETTER incorporates Residual Quantized VAE for semantic regularization, a contrastive alignment loss for collaborative regularization, and a diversity loss to mitigate code assignment bias.
arXiv Detail & Related papers (2024-05-12T15:49:38Z) - A State-of-the-practice Release-readiness Checklist for Generative AI-based Software Products [8.986278918477595]
This paper investigates the complexities of integrating Large Language Models into software products, with a focus on the challenges encountered for determining their readiness for release.
Our systematic review of grey literature identifies common challenges in deploying LLMs, ranging from pre-training and fine-tuning to user experience considerations.
The study introduces a comprehensive checklist designed to guide practitioners in evaluating key release readiness aspects such as performance, monitoring, and deployment strategies.
arXiv Detail & Related papers (2024-03-27T19:02:56Z) - Vision-Language Instruction Tuning: A Review and Analysis [52.218690619616474]
Vision-Language Instruction Tuning (VLIT) presents more complex characteristics compared to pure text instruction tuning.
We offer a detailed categorization for existing VLIT datasets and identify the characteristics that high-quality VLIT data should possess.
By incorporating these characteristics as guiding principles into the existing VLIT data construction process, we conduct extensive experiments and verify their positive impact on the performance of tuned multi-modal LLMs.
arXiv Detail & Related papers (2023-11-14T14:02:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.