Related papers: MorphSeek: Fine-grained Latent Representation-Level Policy Optimization for Deformable Image Registration

MorphSeek: Fine-grained Latent Representation-Level Policy Optimization for Deformable Image Registration

URL: http://arxiv.org/abs/2511.17392v1
Date: Fri, 21 Nov 2025 16:52:20 GMT
Title: MorphSeek: Fine-grained Latent Representation-Level Policy Optimization for Deformable Image Registration
Authors: Runxun Zhang, Yizhou Liu, Li Dongrui, Bo XU, Jingwei Wei,
Abstract summary: Deformable image registration is a fundamental yet challenging problem in medical image analysis.<n>MorphSeek reformulates DIR as a spatially continuous optimization process in the latent feature space.<n>It achieves consistent Dice improvements over competitive baselines while maintaining high label efficiency with minimal parameter cost and low step-level latency.
Score: 6.430696214380013
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deformable image registration (DIR) remains a fundamental yet challenging problem in medical image analysis, largely due to the prohibitively high-dimensional deformation space of dense displacement fields and the scarcity of voxel-level supervision. Existing reinforcement learning frameworks often project this space into coarse, low-dimensional representations, limiting their ability to capture spatially variant deformations. We propose MorphSeek, a fine-grained representation-level policy optimization paradigm that reformulates DIR as a spatially continuous optimization process in the latent feature space. MorphSeek introduces a stochastic Gaussian policy head atop the encoder to model a distribution over latent features, facilitating efficient exploration and coarse-to-fine refinement. The framework integrates unsupervised warm-up with weakly supervised fine-tuning through Group Relative Policy Optimization, where multi-trajectory sampling stabilizes training and improves label efficiency. Across three 3D registration benchmarks (OASIS brain MRI, LiTS liver CT, and Abdomen MR-CT), MorphSeek achieves consistent Dice improvements over competitive baselines while maintaining high label efficiency with minimal parameter cost and low step-level latency overhead. Beyond optimizer specifics, MorphSeek advances a representation-level policy learning paradigm that achieves spatially coherent and data-efficient deformation optimization, offering a principled, backbone-agnostic, and optimizer-agnostic solution for scalable visual alignment in high-dimensional settings.

Related papers

Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models [84.78794648147608]
A persistent geometric anomaly, the Modality Gap, remains.<n>Prior approaches to bridge this gap are largely limited by oversimplified isotropic assumptions.<n>We propose the Fixed-frame Modality Gap Theory, which decomposes the modality gap into stable biases and anisotropic residuals.<n>We then introduce ReAlign, a training-free modality alignment strategy.
arXiv Detail & Related papers (2026-02-02T13:59:39Z)
Model Agnostic Preference Optimization for Medical Image Segmentation [5.289507655906182]
Preference optimization offers a scalable supervision paradigm based on relative preference signals.<n>We propose MAPO (Model-A Preference Optimization), a training framework that utilizes Dropout-driven segmentation hypotheses.<n> MAPO is fully dimensionality-agnostic, supporting 2D/3D CNN and Transformer-based segmentation pipelines.
arXiv Detail & Related papers (2025-12-17T01:50:52Z)
CogniEdit: Dense Gradient Flow Optimization for Fine-Grained Image Editing [88.9067184995168]
We propose a unified framework CogniEdit, combining multi-modal reasoning with dense reward optimization.<n>Our method achieves state-of-the-art performance in balancing fine-grained instruction following with visual quality and editability preservation.
arXiv Detail & Related papers (2025-12-15T12:36:50Z)
Graph Laplacian Transformer with Progressive Sampling for Prostate Cancer Grading [2.9485900021889146]
We propose a Graph Laplacian Attention-Based Transformer (GLAT) integrated with an Iterative Refinement Module (IRM) to enhance both feature learning and spatial consistency.<n>IRM iteratively refines patch selection by leveraging a pretrained ResNet50 for local feature extraction and a foundation model in no-gradient mode for importance scoring.<n>The GLAT models tissue-level connectivity by constructing a graph where patches serve as nodes, ensuring spatial consistency through graph Laplacian constraints.
arXiv Detail & Related papers (2025-12-11T16:55:57Z)
From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting [26.57713792657793]
We propose a motion-adaptive framework that aligns control density with motion complexity.<n>We show significant improvements in reconstruction quality and efficiency over existing state-of-the-art methods.
arXiv Detail & Related papers (2025-10-03T05:33:58Z)
Towards Efficient General Feature Prediction in Masked Skeleton Modeling [59.46799426434277]
We propose a novel General Feature Prediction framework (GFP) for efficient mask skeleton modeling.<n>Our key innovation is replacing conventional low-level reconstruction with high-level feature prediction that spans from local motion patterns to global semantic representations.
arXiv Detail & Related papers (2025-09-03T18:05:02Z)
Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting [95.61137026932062]
Intern-GS is a novel approach to enhance the process of sparse-view Gaussian splatting.<n>We show that Intern-GS achieves state-of-the-art rendering quality across diverse datasets.
arXiv Detail & Related papers (2025-05-27T05:17:49Z)
MCGS: Multiview Consistency Enhancement for Sparse-View 3D Gaussian Radiance Fields [100.90743697473232]
Radiance fields represented by 3D Gaussians excel at synthesizing novel views, offering both high training efficiency and fast rendering.<n>Existing methods often incorporate depth priors from dense estimation networks but overlook the inherent multi-view consistency in input images.<n>We propose a view synthesis framework based on 3D Gaussian Splatting, enabling scene reconstruction from sparse views.
arXiv Detail & Related papers (2024-10-15T08:39:05Z)
Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation [60.80423207808076]
Capturing long-range dependencies while preserving high-resolution visual representations is crucial for dense prediction tasks such as human pose estimation.<n>We propose the Dynamic Visual State Space (DVSS) block, which augments visual state space models with multi-scale convolutional operations.<n>We build HRVMamba, a novel model for efficient high-resolution representation learning.
arXiv Detail & Related papers (2024-10-04T06:19:29Z)
HGSLoc: 3DGS-based Heuristic Camera Pose Refinement [13.393035855468428]
HGSLoc is a novel lightweight plug-and-play pose optimization framework.<n>It integrates 3D reconstruction with a refinement strategy to achieve higher pose estimation accuracy.<n>Our method demonstrates higher localization accuracy compared to NeRF-based neural localization approaches.
arXiv Detail & Related papers (2024-09-17T06:48:48Z)
Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression [63.56922682378755]
We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding. The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform. Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
arXiv Detail & Related papers (2023-08-17T01:34:51Z)
Break a Lag: Triple Exponential Moving Average for Enhanced Optimization [2.0199251985015434]
We introduce Fast Adaptive Moment Estimation (FAME), a novel optimization technique that leverages the power of Triple Exponential Moving Average.<n>FAME enhances responsiveness to data dynamics, mitigates trend identification lag, and optimize learning efficiency.<n>Our comprehensive evaluation encompasses different computer vision tasks including image classification, object detection, and semantic segmentation, integrating FAME into 30 distinct architectures.
arXiv Detail & Related papers (2023-06-02T10:29:33Z)
Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation. We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.