FluenceFormer: Transformer-Driven Multi-Beam Fluence Map Regression for Radiotherapy Planning
- URL: http://arxiv.org/abs/2512.22425v1
- Date: Sat, 27 Dec 2025 01:12:15 GMT
- Title: FluenceFormer: Transformer-Driven Multi-Beam Fluence Map Regression for Radiotherapy Planning
- Authors: Ujunwa Mgboh, Rafi Ibn Sultan, Joshua Kim, Kundan Thind, Dongxiao Zhu,
- Abstract summary: We introduce textbfFluenceFormer, a backbone-agnostic transformer framework for direct, geometry-aware fluence regression.<n>FluenceFormer with Swin UNETR achieves the strongest performance among the evaluated models.
- Score: 4.066732323672965
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fluence map prediction is central to automated radiotherapy planning but remains an ill-posed inverse problem due to the complex relationship between volumetric anatomy and beam-intensity modulation. Convolutional methods in prior work often struggle to capture long-range dependencies, which can lead to structurally inconsistent or physically unrealizable plans. We introduce \textbf{FluenceFormer}, a backbone-agnostic transformer framework for direct, geometry-aware fluence regression. The model uses a unified two-stage design: Stage~1 predicts a global dose prior from anatomical inputs, and Stage~2 conditions this prior on explicit beam geometry to regress physically calibrated fluence maps. Central to the approach is the \textbf{Fluence-Aware Regression (FAR)} loss, a physics-informed objective that integrates voxel-level fidelity, gradient smoothness, structural consistency, and beam-wise energy conservation. We evaluate the generality of the framework across multiple transformer backbones, including Swin UNETR, UNETR, nnFormer, and MedFormer, using a prostate IMRT dataset. FluenceFormer with Swin UNETR achieves the strongest performance among the evaluated models and improves over existing benchmark CNN and single-stage methods, reducing Energy Error to $\mathbf{4.5\%}$ and yielding statistically significant gains in structural fidelity ($p < 0.05$).
Related papers
- Plug-and-Play Diffusion Meets ADMM: Dual-Variable Coupling for Robust Medical Image Reconstruction [45.25461515976432]
Plug-and-Play diffusion prior (DP) frameworks have emerged as a powerful paradigm for imaging reconstruction.<n>We present a novel approach to resolving bias-hallucination trade-off, achieving state-of-the-art gradients with significantly accelerated convergence.
arXiv Detail & Related papers (2026-02-26T16:58:43Z) - SpanNorm: Reconciling Training Stability and Performance in Deep Transformers [55.100133502295996]
We propose SpanNorm, a novel technique designed to resolve the dilemma by integrating the strengths of both paradigms.<n>We provide a theoretical analysis demonstrating that SpanNorm, combined with a principled scaling strategy, maintains bounded signal variance throughout the network.<n> Empirically, SpanNorm consistently outperforms standard normalization schemes in both dense and Mixture-of-Experts (MoE) scenarios.
arXiv Detail & Related papers (2026-01-30T05:21:57Z) - Simba: Towards High-Fidelity and Geometrically-Consistent Point Cloud Completion via Transformation Diffusion [31.34032485865941]
We introduce Simba, a novel framework that reformulates point-wise transformation regression as a distribution learning problem.<n>Our approach integrates symmetry priors with the powerful generative capabilities of diffusion models, avoiding instance-specific memorization.
arXiv Detail & Related papers (2025-11-20T09:02:42Z) - Cross-Distribution Diffusion Priors-Driven Iterative Reconstruction for Sparse-View CT [19.18392754322368]
Sparse-View CT (SVCT) reconstruction enhances temporal resolution and reduces radiation dose, yet its clinical use is hindered by artifacts.<n>We propose a Cross-Distribution Diffusion Priors-Driven Iterative Reconstruction framework to tackle the OOD problem in SVCT.
arXiv Detail & Related papers (2025-09-16T22:35:13Z) - ResPF: Residual Poisson Flow for Efficient and Physically Consistent Sparse-View CT Reconstruction [7.644299873269135]
Sparse-view computed tomography (CT) is a practical solution to reduce radiation dose, but the resulting inverse problem poses significant challenges for accurate image reconstruction.<n>Recent advances in generative modeling, particularly Poisson Flow Generative Models (PFGM), enable high-fidelity image synthesis.<n>We propose Residual Poisson Flow (ResPF) Generative Models for efficient and accurate sparse-view CT reconstruction.
arXiv Detail & Related papers (2025-06-06T01:43:35Z) - Global-to-Local Modeling for Video-based 3D Human Pose and Shape
Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness.
We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT)
Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z) - PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular
Images [60.33197938330409]
PyMAF-X is a regression-based approach to recovering parametric full-body models from monocular images.
PyMAF and PyMAF-X effectively improve the mesh-image alignment and achieve new state-of-the-art results.
arXiv Detail & Related papers (2022-07-13T17:58:33Z) - Poseur: Direct Human Pose Regression with Transformers [119.79232258661995]
We propose a direct, regression-based approach to 2D human pose estimation from single images.
Our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints.
Ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.
arXiv Detail & Related papers (2022-01-19T04:31:57Z) - Homography Decomposition Networks for Planar Object Tracking [11.558401177707312]
Planar object tracking plays an important role in AI applications, such as robotics, visual servoing, and visual SLAM.
We propose a novel Homography Decomposition Networks(HDN) approach that drastically reduces and stabilizes the condition number by decomposing the homography transformation into two groups.
arXiv Detail & Related papers (2021-12-15T06:13:32Z) - TFPose: Direct Human Pose Estimation with Transformers [83.03424247905869]
We formulate the pose estimation task into a sequence prediction problem that can effectively be solved by transformers.
Our framework is simple and direct, bypassing the drawbacks of the heatmap-based pose estimation.
Experiments on the MS-COCO and MPII datasets demonstrate that our method can significantly improve the state-of-the-art of regression-based pose estimation.
arXiv Detail & Related papers (2021-03-29T04:18:54Z) - FPCR-Net: Feature Pyramidal Correlation and Residual Reconstruction for
Optical Flow Estimation [72.41370576242116]
We propose a semi-supervised Feature Pyramidal Correlation and Residual Reconstruction Network (FPCR-Net) for optical flow estimation from frame pairs.
It consists of two main modules: pyramid correlation mapping and residual reconstruction.
Experiment results show that the proposed scheme achieves the state-of-the-art performance, with improvement by 0.80, 1.15 and 0.10 in terms of average end-point error (AEE) against competing baseline methods.
arXiv Detail & Related papers (2020-01-17T07:13:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.