Enhancing Generalization in Medical Visual Question Answering Tasks via
Gradient-Guided Model Perturbation
- URL: http://arxiv.org/abs/2403.02707v1
- Date: Tue, 5 Mar 2024 06:57:37 GMT
- Title: Enhancing Generalization in Medical Visual Question Answering Tasks via
Gradient-Guided Model Perturbation
- Authors: Gang Liu, Hongyang Li, Zerui He, Shenjun Zhong
- Abstract summary: We introduce a method that incorporates gradient-guided perturbations to the visual encoder of the multimodality model during both pre-training and fine-tuning phases.
The results show that, even with a significantly smaller pre-training image caption dataset, our approach achieves competitive outcomes.
- Score: 16.22199565010318
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Leveraging pre-trained visual language models has become a widely adopted
approach for improving performance in downstream visual question answering
(VQA) applications. However, in the specialized field of medical VQA, the
scarcity of available data poses a significant barrier to achieving reliable
model generalization. Numerous methods have been proposed to enhance model
generalization, addressing the issue from data-centric and model-centric
perspectives. Data augmentation techniques are commonly employed to enrich the
dataset, while various regularization approaches aim to prevent model
overfitting, especially when training on limited data samples. In this paper,
we introduce a method that incorporates gradient-guided parameter perturbations
to the visual encoder of the multimodality model during both pre-training and
fine-tuning phases, to improve model generalization for downstream medical VQA
tasks. The small perturbation is adaptively generated by aligning with the
direction of the moving average gradient in the optimization landscape, which
is opposite to the directions of the optimizer's historical updates. It is
subsequently injected into the model's visual encoder. The results show that,
even with a significantly smaller pre-training image caption dataset, our
approach achieves competitive outcomes on both VQA-RAD and SLAKE datasets.
Related papers
- HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters [53.97380482341493]
"pre-train, prompt-tuning" has demonstrated impressive performance for tuning pre-trained heterogeneous graph neural networks (HGNNs)
We propose a unified framework that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models.
arXiv Detail & Related papers (2024-11-02T06:43:54Z) - DepthART: Monocular Depth Estimation as Autoregressive Refinement Task [2.3884184860468136]
We introduce the first autoregressive depth estimation model based on the visual autoregressive transformer.
Our primary contribution is DepthART, a novel training method formulated as Depth Autoregressive Refinement Task.
Our experiments demonstrate that the proposed training approach significantly outperforms visual autoregressive modeling via next-scale prediction in the depth estimation task.
arXiv Detail & Related papers (2024-09-23T13:36:34Z) - Calibrated Self-Rewarding Vision Language Models [27.686545023186852]
Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning.
LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image.
We propose the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning.
arXiv Detail & Related papers (2024-05-23T14:30:33Z) - TED: Accelerate Model Training by Internal Generalization [19.336762953352956]
Large language models have demonstrated strong performance in recent years, but the high cost of training drives the need for efficient methods to compress dataset sizes.
We propose TED pruning, a method that addresses the challenge of overfitting under high pruning ratios by quantifying the model's ability to improve performance on pruned data.
arXiv Detail & Related papers (2024-05-06T07:40:13Z) - Gradient Guidance for Diffusion Models: An Optimization Perspective [45.6080199096424]
This paper studies a form of gradient guidance for adapting a pre-trained diffusion model towards optimizing user-specified objectives.
We establish a mathematical framework for guided diffusion to systematically study its optimization theory and algorithmic design.
arXiv Detail & Related papers (2024-04-23T04:51:02Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - Robust Optimization as Data Augmentation for Large-scale Graphs [117.2376815614148]
We propose FLAG (Free Large-scale Adversarial Augmentation on Graphs), which iteratively augments node features with gradient-based adversarial perturbations during training.
FLAG is a general-purpose approach for graph data, which universally works in node classification, link prediction, and graph classification tasks.
arXiv Detail & Related papers (2020-10-19T21:51:47Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.