Model Agnostic Preference Optimization for Medical Image Segmentation
- URL: http://arxiv.org/abs/2512.15009v1
- Date: Wed, 17 Dec 2025 01:50:52 GMT
- Title: Model Agnostic Preference Optimization for Medical Image Segmentation
- Authors: Yunseong Nam, Jiwon Jang, Dongkyu Won, Sang Hyun Park, Soopil Kim,
- Abstract summary: Preference optimization offers a scalable supervision paradigm based on relative preference signals.<n>We propose MAPO (Model-A Preference Optimization), a training framework that utilizes Dropout-driven segmentation hypotheses.<n> MAPO is fully dimensionality-agnostic, supporting 2D/3D CNN and Transformer-based segmentation pipelines.
- Score: 5.289507655906182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Preference optimization offers a scalable supervision paradigm based on relative preference signals, yet prior attempts in medical image segmentation remain model-specific and rely on low-diversity prediction sampling. In this paper, we propose MAPO (Model-Agnostic Preference Optimization), a training framework that utilizes Dropout-driven stochastic segmentation hypotheses to construct preference-consistent gradients without direct ground-truth supervision. MAPO is fully architecture- and dimensionality-agnostic, supporting 2D/3D CNN and Transformer-based segmentation pipelines. Comprehensive evaluations across diverse medical datasets reveal that MAPO consistently enhances boundary adherence, reduces overfitting, and yields more stable optimization dynamics compared to conventional supervised training.
Related papers
- MorphSeek: Fine-grained Latent Representation-Level Policy Optimization for Deformable Image Registration [6.430696214380013]
Deformable image registration is a fundamental yet challenging problem in medical image analysis.<n>MorphSeek reformulates DIR as a spatially continuous optimization process in the latent feature space.<n>It achieves consistent Dice improvements over competitive baselines while maintaining high label efficiency with minimal parameter cost and low step-level latency.
arXiv Detail & Related papers (2025-11-21T16:52:20Z) - Amortized Active Generation of Pareto Sets [48.56811624922571]
A-GPS is a new framework for online discrete black-box multi-objective optimization.<n>Method employs a class probability estimator to predict non-dominance relations.<n>We show that this non-dominance CPE implicitly estimates the probability of hypervolume improvement.
arXiv Detail & Related papers (2025-10-23T23:49:23Z) - From Noisy Traces to Stable Gradients: Bias-Variance Optimized Preference Optimization for Aligning Large Reasoning Models [90.45197506653341]
Large reasoning models generate intermediate reasoning traces before producing final answers.<n> aligning LRMs with human preferences, a crucial prerequisite for model deployment, remains underexplored.<n>A common workaround optimized a single sampled trajectory, which introduces substantial gradient variance from trace sampling.
arXiv Detail & Related papers (2025-10-06T17:58:01Z) - Stable Preference Optimization for LLMs: A Bilevel Approach Beyond Direct Preference Optimization [2.384797824772941]
We present a comprehensive analysis of DPO's dynamics from a probability evolution perspective.<n>We propose a theoretically grounded bilevel optimization framework that tightly integrate supervised fine-tuning with an enhanced DPO objective a.k.a. stable preference optimization.
arXiv Detail & Related papers (2025-07-10T12:57:39Z) - Divergence Minimization Preference Optimization for Diffusion Model Alignment [66.31417479052774]
Divergence Minimization Preference Optimization (DMPO) is a principled method for aligning diffusion models by minimizing reverse KL divergence.<n>DMPO can consistently outperform or match existing techniques across different base models and test sets.
arXiv Detail & Related papers (2025-07-10T07:57:30Z) - Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization [69.05600758833471]
Direct Preference Optimization (DPO) has emerged as an effective approach for mitigating hallucination in Multimodal Large Language Models (MLLMs)<n>We propose a Symmetric Multimodal Preference Optimization (SymMPO) which conducts symmetric preference learning with direct preference supervision (i.e., response pairs)<n>In addition to conventional ordinal preference learning, SymMPO introduces a preference margin consistency loss to quantitatively regulate the preference gap between symmetric preference pairs.
arXiv Detail & Related papers (2025-06-13T12:29:15Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking [50.325021634589596]
We propose a Tailored Optimization Preference (TailorPO) framework for aligning diffusion models with human preference.<n>Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues.<n> Experimental results demonstrate that our method significantly improves the model's ability to generate aesthetically pleasing and human-preferred images.
arXiv Detail & Related papers (2025-02-01T16:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.