Related papers: FRAM: Frobenius-Regularized Assignment Matching with Mixed-Precision Computing

FRAM: Frobenius-Regularized Assignment Matching with Mixed-Precision Computing

URL: http://arxiv.org/abs/2508.00887v1
Date: Sat, 26 Jul 2025 07:35:09 GMT
Title: FRAM: Frobenius-Regularized Assignment Matching with Mixed-Precision Computing
Authors: Binrui Shen, Yuan Liang, Shengxin Zhu,
Abstract summary: A Quadratic Assignment Problem (QAP) seeks to establish node correspondences between two graphs.<n>We develop a theoretically grounded mixed-precision architecture to achieve substantial acceleration.<n>FRAM achieves up to 370X speedup over its CPU-FP64 counterpart, with negligible loss in solution accuracy.
Score: 6.987672546471471
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graph matching, typically formulated as a Quadratic Assignment Problem (QAP), seeks to establish node correspondences between two graphs. To address the NP-hardness of QAP, some existing methods adopt projection-based relaxations that embed the problem into the convex hull of the discrete domain. However, these relaxations inevitably enlarge the feasible set, introducing two sources of error: numerical scale sensitivity and geometric misalignment between the relaxed and original domains. To alleviate these errors, we propose a novel relaxation framework by reformulating the projection step as a Frobenius-regularized Linear Assignment (FRA) problem, where a tunable regularization term mitigates feasible region inflation. This formulation enables normalization-based operations to preserve numerical scale invariance without compromising accuracy. To efficiently solve FRA, we propose the Scaling Doubly Stochastic Normalization (SDSN) algorithm. Building on its favorable computational properties, we develop a theoretically grounded mixed-precision architecture to achieve substantial acceleration. Comprehensive CPU-based benchmarks demonstrate that FRAM consistently outperforms all baseline methods under identical precision settings. When combined with a GPU-based mixed-precision architecture, FRAM achieves up to 370X speedup over its CPU-FP64 counterpart, with negligible loss in solution accuracy.

Related papers

Data-Driven Adaptive Gradient Recovery for Unstructured Finite Volume Computations [0.0]
We present a novel data-driven approach for enhancing gradient reconstruction in unstructured finite volume methods for hyperbolic conservation laws.<n>Our approach extends previous structured-grid methodologies to unstructured meshes through a modified DeepONet architecture.<n>The proposed algorithm is faster and more accurate than the traditional second-order finite volume solver.
arXiv Detail & Related papers (2025-07-22T13:23:57Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Federated Smoothing ADMM for Localization [9.25126455172971]
Federated systems are characterized by distributed data, non-smoothity, and non-smoothness.<n>We propose a robust algorithm to tackle the scalability and outlier issues inherent in such environments.<n>To validate the reliability of the proposed algorithm, we show that it converges to a stationary point.<n> numerical simulations highlight its superior performance in convergence resilience compared to existing state-of-the-art localization methods.
arXiv Detail & Related papers (2025-03-12T16:01:34Z)
Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems. We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z)
Federated Smoothing Proximal Gradient for Quantile Regression with Non-Convex Penalties [3.269165283595478]
Distributed sensors in the internet-of-things (IoT) generate vast amounts of sparse data. We propose a federated smoothing proximal gradient (G) algorithm that integrates a smoothing mechanism with the view, thereby both precision and computational speed.
arXiv Detail & Related papers (2024-08-10T21:50:19Z)
Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems. We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z)
Fast and Robust Non-Rigid Registration Using Accelerated Majorization-Minimization [35.66014845211251]
Non-rigid registration, which deforms a source shape in a non-rigid way to align with a target shape, is a classical problem in computer vision. Existing methods typically adopt the $ell_p$ type robust norm to measure the alignment error and regularize the smoothness of deformation. We propose a formulation for robust non-rigid registration based on a globally smooth robust norm for alignment and regularization.
arXiv Detail & Related papers (2022-06-07T16:00:33Z)
Distributed Sketching for Randomized Optimization: Exact Characterization, Concentration and Lower Bounds [54.51566432934556]
We consider distributed optimization methods for problems where forming the Hessian is computationally challenging. We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems.
arXiv Detail & Related papers (2022-03-18T05:49:13Z)
Efficient semidefinite bounds for multi-label discrete graphical models [6.226454551201676]
One of the main queries on such models is to identify the SDPWCSP Function on Cost of a Posteri (MAP) Networks. We consider a traditional dualized constraint approach and a dedicated dedicated SDP/Monteiro style method based on row-by-row updates.
arXiv Detail & Related papers (2021-11-24T13:38:34Z)
Square Root Bundle Adjustment for Large-Scale Reconstruction [56.44094187152862]
We propose a new formulation for the bundle adjustment problem which relies on nullspace marginalization of landmark variables by QR decomposition. Our approach, which we call square root bundle adjustment, is algebraically equivalent to the commonly used Schur complement trick. We show in real-world experiments with the BAL datasets that even in single precision the proposed solver achieves on average equally accurate solutions.
arXiv Detail & Related papers (2021-03-02T16:26:20Z)
Efficient semidefinite-programming-based inference for binary and multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF. We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.