Related papers: Quark Medical Alignment: A Holistic Multi-Dimensional Alignment and Collaborative Optimization Paradigm

Quark Medical Alignment: A Holistic Multi-Dimensional Alignment and Collaborative Optimization Paradigm

URL: http://arxiv.org/abs/2602.11661v1
Date: Thu, 12 Feb 2026 07:26:23 GMT
Title: Quark Medical Alignment: A Holistic Multi-Dimensional Alignment and Collaborative Optimization Paradigm
Authors: Tianxiang Xu, Jiayi Liu, Yixuan Tong, Jialu Xu, Yunqing Wei, Kaiwen Feng, PanPan Hou, Kangping Yin, Jiyuan Hu, Hao Zhou, Zhenxin Ma, Jian Xu, Guanjun Jiang,
Abstract summary: Reinforcement learning for large language model alignment has progressed rapidly in recent years.<n> transferring these paradigms to high-stakes medical question answering reveals a fundamental paradigm mismatch.<n>We propose a robust medical alignment paradigm to address these challenges.
Score: 7.449373800890174
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While reinforcement learning for large language model alignment has progressed rapidly in recent years, transferring these paradigms to high-stakes medical question answering reveals a fundamental paradigm mismatch. Reinforcement Learning from Human Feedback relies on preference annotations that are prohibitively expensive and often fail to reflect the absolute correctness of medical facts. Reinforcement Learning from Verifiable Rewards lacks effective automatic verifiers and struggles to handle complex clinical contexts. Meanwhile, medical alignment requires the simultaneous optimization of correctness, safety, and compliance, yet multi-objective heterogeneous reward signals are prone to scale mismatch and optimization conflicts.To address these challenges, we propose a robust medical alignment paradigm. We first construct a holistic multi-dimensional medical alignment matrix that decomposes alignment objectives into four categories: fundamental capabilities, expert knowledge, online feedback, and format specifications. Within each category, we establish a closed loop of where observable metrics inform attributable diagnosis, which in turn drives optimizable rewards, thereby providing fine-grained, high-resolution supervision signals for subsequent iterative optimization. To resolve gradient domination and optimization instability problem caused by heterogeneous signals, we further propose a unified optimization mechanism. This mechanism employs Reference-Frozen Normalization to align reward scales and implements a Tri-Factor Adaptive Dynamic Weighting strategy to achieve collaborative optimization that is weakness-oriented, risk-prioritized, and redundancy-reducing. Experimental results demonstrate the effectiveness of our proposed paradigm in real-world medical scenario evaluations, establishing a new paradigm for complex alignment in vertical domains.

Related papers

Is Softmax Loss All You Need? A Principled Analysis of Softmax-family Loss [91.61796429377041]
The Softmax loss is one of the most widely employed surrogate objectives for classification and ranking tasks.<n>We investigate whether different surrogates achieve consistency with classification and ranking metrics, and analyze their gradient dynamics to reveal distinct convergence behaviors.<n>Our results establish a principled foundation and offer practical guidance for loss selections in large-class machine learning applications.
arXiv Detail & Related papers (2026-01-30T09:24:52Z)
Optimizing the Adversarial Perturbation with a Momentum-based Adaptive Matrix [13.862664606369014]
We present a novel momentum-based attack AdaMI, in which the perturbation is optimized with an interesting momentum-based adaptive matrix.<n>AdaMI is proved to attain optimal convergence for convex problems, indicating that it addresses the non-convergence issue of MI-FGSM.
arXiv Detail & Related papers (2025-12-16T08:35:18Z)
Deep Unfolding: Recent Developments, Theory, and Design Guidelines [99.63555420898554]
This article provides a tutorial-style overview of deep unfolding, a framework that transforms optimization algorithms into structured, trainable ML architectures.<n>We review the foundations of optimization for inference and for learning, introduce four representative design paradigms for deep unfolding, and discuss the distinctive training schemes that arise from their iterative nature.
arXiv Detail & Related papers (2025-12-03T13:16:35Z)
MedAlign: A Synergistic Framework of Multimodal Preference Optimization and Federated Meta-Cognitive Reasoning [52.064286116035134]
We develop MedAlign, a framework to ensure visually accurate LVLM responses for Medical Visual Question Answering (Med-VQA)<n>We first propose a multimodal Direct Preference Optimization (mDPO) objective to align preference learning with visual context.<n>We then design a Retrieval-Aware Mixture-of-Experts (RA-MoE) architecture that utilizes image and text similarity to route queries to a specialized and context-augmented LVLM.
arXiv Detail & Related papers (2025-10-24T02:11:05Z)
RPRO: Ranked Preference Reinforcement Optimization for Enhancing Medical QA and Diagnostic Reasoning [5.493679122639688]
Medical question answering requires advanced reasoning that integrates domain knowledge with logical inference.<n>We propose Ranked Preference Reinforcement Optimization (RPRO), a novel framework that combines reinforcement learning with preference-driven reasoning refinement.
arXiv Detail & Related papers (2025-08-31T19:38:25Z)
Prompt Mechanisms in Medical Imaging: A Comprehensive Survey [18.072753363565322]
Deep learning offers transformative potential in medical imaging.<n>Yet its clinical adoption is frequently hampered by challenges such as data scarcity, distribution shifts, and the need for robust task generalization.<n>Prompt-based methodologies have emerged as a pivotal strategy to guide deep learning models.
arXiv Detail & Related papers (2025-06-28T03:06:25Z)
HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models [23.158036246184174]
We propose Hierarchical Self-Contrastive Rewarding (HSCR), a novel approach that addresses two critical challenges in Med-VLM alignment.<n>HSCR generates high-quality preference data and captures nuanced and context-aware preferences for improved alignment.
arXiv Detail & Related papers (2025-06-01T03:11:00Z)
Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via Optimization Trajectory Distillation [73.83178465971552]
The success of automated medical image analysis depends on large-scale and expert-annotated training sets. Unsupervised domain adaptation (UDA) has been raised as a promising approach to alleviate the burden of labeled data collection. We propose optimization trajectory distillation, a unified approach to address the two technical challenges from a new perspective.
arXiv Detail & Related papers (2023-07-27T08:58:05Z)
Resource Planning for Hospitals Under Special Consideration of the COVID-19 Pandemic: Optimization and Sensitivity Analysis [87.31348761201716]
Crises like the COVID-19 pandemic pose a serious challenge to health-care institutions. BaBSim.Hospital is a tool for capacity planning based on discrete event simulation. We aim to investigate and optimize these parameters to improve BaBSim.Hospital.
arXiv Detail & Related papers (2021-05-16T12:38:35Z)
Optimization-Inspired Learning with Architecture Augmentations and Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles. We construct three propagative modules to effectively solve the optimization models with flexible combinations. Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)
A Coupled Manifold Optimization Framework to Jointly Model the Functional Connectomics and Behavioral Data Spaces [5.382679710017696]
We propose a coupled manifold optimization framework which projects fMRI data onto a low dimensional matrix manifold common to the cohort. The patient specific loadings simultaneously map onto a behavioral measure of interest via a second, non-linear, manifold. We validate our framework on resting state fMRI from fifty-eight patients with Autism Spectrum Disorder.
arXiv Detail & Related papers (2020-07-03T20:12:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.