From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
- URL: http://arxiv.org/abs/2602.22859v1
- Date: Thu, 26 Feb 2026 10:53:57 GMT
- Title: From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
- Authors: Hongrui Jia, Chaoya Jiang, Shikun Zhang, Wei Ye,
- Abstract summary: Diagnostic-driven Progressive Evolution (DPE) is a spiral loop where diagnosis steers data generation and reinforcement, and each re-diagnoses the updated model.<n>DPE attributes failures to specific weaknesses, dynamically adjusts the data mixture, and guides agents to generate weakness focused data for targeted reinforcement.<n> Experiments on Qwen3-VL-8B-Instruct and Qwen2.5-VL-7B-Instruct show stable, continual gains across eleven benchmarks.
- Score: 34.940968264459805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As Large Multimodal Models (LMMs) scale up and reinforcement learning (RL) methods mature, LMMs have made notable progress in complex reasoning and decision making. Yet training still relies on static data and fixed recipes, making it difficult to diagnose capability blind spots or provide dynamic, targeted reinforcement. Motivated by findings that test driven error exposure and feedback based correction outperform repetitive practice, we propose Diagnostic-driven Progressive Evolution (DPE), a spiral loop where diagnosis steers data generation and reinforcement, and each iteration re-diagnoses the updated model to drive the next round of targeted improvement. DPE has two key components. First, multiple agents annotate and quality control massive unlabeled multimodal data, using tools such as web search and image editing to produce diverse, realistic samples. Second, DPE attributes failures to specific weaknesses, dynamically adjusts the data mixture, and guides agents to generate weakness focused data for targeted reinforcement. Experiments on Qwen3-VL-8B-Instruct and Qwen2.5-VL-7B-Instruct show stable, continual gains across eleven benchmarks, indicating DPE as a scalable paradigm for continual LMM training under open task distributions. Our code, models, and data are publicly available at https://github.com/hongruijia/DPE.
Related papers
- Adaptive Data Augmentation with Multi-armed Bandit: Sample-Efficient Embedding Calibration for Implicit Pattern Recognition [12.731093427395985]
ADAMAB is an efficient embedding calibration framework for few-shot pattern recognition.<n>Our experiments justify the superior performance of ADAMAB, with up to 40% accuracy improvement.
arXiv Detail & Related papers (2026-02-22T23:39:21Z) - LSM-2: Learning from Incomplete Wearable Sensor Data [65.58595667477505]
This paper introduces the second generation of Large Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM)<n>AIM learns robust representations directly from incomplete data without requiring explicit imputation.<n>Our LSM-2 with AIM achieves the best performance across a diverse range of tasks, including classification, regression and generative modeling.
arXiv Detail & Related papers (2025-06-05T17:57:11Z) - DMRL: Data- and Model-aware Reward Learning for Data Extraction [3.511535517476954]
Large language models (LLMs) are inherently vulnerable to unintended privacy breaches.<n>We propose a Data- and Model-aware Reward Learning approach for data extraction.
arXiv Detail & Related papers (2025-05-07T07:21:37Z) - PGAD: Prototype-Guided Adaptive Distillation for Multi-Modal Learning in AD Diagnosis [7.260212065205214]
Missing modalities pose a major issue in Alzheimer's Disease (AD) diagnosis.<n>Most existing methods train only on complete data, ignoring the large proportion of incomplete samples in real-world datasets like ADNI.<n>We propose a Prototype-Guided Adaptive Distillation (PGAD) framework that directly incorporates incomplete multi-modal data into training.
arXiv Detail & Related papers (2025-03-05T14:39:31Z) - What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification [2.5091334993691206]
Development of a robust deep-learning model for retinal disease diagnosis requires a substantial dataset for training.
The capacity to generalize effectively on smaller datasets remains a persistent challenge.
We've combined a wide range of data sources to improve performance and generalization to new data.
arXiv Detail & Related papers (2024-09-17T17:22:35Z) - HyperMM : Robust Multimodal Learning with Varying-sized Inputs [4.377889826841039]
HyperMM is an end-to-end framework designed for learning with varying-sized inputs.
We introduce a novel strategy for training a universal feature extractor using a conditional hypernetwork.
We experimentally demonstrate the advantages of our method in two tasks: Alzheimer's disease detection and breast cancer classification.
arXiv Detail & Related papers (2024-07-30T12:13:18Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN)
CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data.
Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z) - DOCTOR: A Multi-Disease Detection Continual Learning Framework Based on Wearable Medical Sensors [3.088223994180069]
We propose DOCTOR, a multi-disease detection continual learning framework based on wearable medical sensors (WMSs)
It employs a multi-headed deep neural network (DNN) and a replay-style CL algorithm.
It achieves 1.43 times better average test accuracy, 1.25 times better F1-score, and 0.41 higher backward transfer than the naive fine-tuning framework.
arXiv Detail & Related papers (2023-05-09T19:33:17Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.