Related papers: FastJAM: a Fast Joint Alignment Model for Images

FastJAM: a Fast Joint Alignment Model for Images

URL: http://arxiv.org/abs/2510.22842v2
Date: Wed, 29 Oct 2025 10:18:17 GMT
Title: FastJAM: a Fast Joint Alignment Model for Images
Authors: Omri Hirsch, Ron Shapira Weber, Shira Ifergane, Oren Freifeld,
Abstract summary: Joint Alignment of images aims to align a collection of images into a unified coordinate frame, such that semantically-similar features appear at corresponding spatial locations.<n>We introduce FastJAM, a rapid, graph-based method that drastically reduces the computational complexity of joint alignment tasks.
Score: 10.522943649619426
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Joint Alignment (JA) of images aims to align a collection of images into a unified coordinate frame, such that semantically-similar features appear at corresponding spatial locations. Most existing approaches often require long training times, large-capacity models, and extensive hyperparameter tuning. We introduce FastJAM, a rapid, graph-based method that drastically reduces the computational complexity of joint alignment tasks. FastJAM leverages pairwise matches computed by an off-the-shelf image matcher, together with a rapid nonparametric clustering, to construct a graph representing intra- and inter-image keypoint relations. A graph neural network propagates and aggregates these correspondences, efficiently predicting per-image homography parameters via image-level pooling. Utilizing an inverse-compositional loss, that eliminates the need for a regularization term over the predicted transformations (and thus also obviates the hyperparameter tuning associated with such terms), FastJAM performs image JA quickly and effectively. Experimental results on several benchmarks demonstrate that FastJAM achieves results better than existing modern JA methods in terms of alignment quality, while reducing computation time from hours or minutes to mere seconds. Our code is available at our project webpage, https://bgu-cs-vil.github.io/FastJAM/

Related papers

Hierarchical Scheduling for Multi-Vector Image Retrieval [17.023146933530484]
HiMIR is an efficient scheduling framework for image retrieval.<n>We introduce a novel hierarchical paradigm, employing multiple intermediate granularities for varying image objects to enhance alignment.<n>Our empirical study shows that, HiMIR not only achieves substantial accuracy improvements but also reduces computation by up to 3.5 times over the existing MVR system.
arXiv Detail & Related papers (2025-10-10T03:36:18Z)
Speedy MASt3R [68.47052557089631]
MASt3R redefines image matching as a 3D task by leveraging DUSt3R and introducing a fast reciprocal matching scheme.<n>Fast MASt3R achieves a 54% reduction in inference time (198 ms to 91 ms per image pair) without sacrificing accuracy.<n>This advancement enables real-time 3D understanding, benefiting applications like mixed reality navigation and large-scale 3D scene reconstruction.
arXiv Detail & Related papers (2025-03-13T03:56:22Z)
Cross-Modal Mapping: Mitigating the Modality Gap for Few-Shot Image Classification [13.238769012534922]
We propose a novel Cross-Modal Mapping (CMM) method for few-shot image classification.<n>CMM aligns image features with the text feature space through linear transformation.<n>It improves the average Top-1 accuracy by 1.06% on 11 benchmark datasets.
arXiv Detail & Related papers (2024-12-28T10:40:21Z)
Fast constrained sampling in pre-trained diffusion models [80.99262780028015]
We propose an algorithm that enables fast, high-quality generation under arbitrary constraints.<n>Our approach produces results that rival or surpass the state-of-the-art training-free inference methods.
arXiv Detail & Related papers (2024-10-24T14:52:38Z)
SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of Images [9.099291890744201]
Unsupervised Joint Alignment is beset by challenges such as high complexity, geometric distortions, and convergence to poor local or even global optima.<n>We introduce the Spatial Joint Alignment Model (SpaceJAM), a novel approach that addresses the JA task with efficiency and simplicity.
arXiv Detail & Related papers (2024-07-16T15:32:39Z)
Image-GS: Content-Adaptive Image Representation via 2D Gaussians [52.598772767324036]
We introduce Image-GS, a content-adaptive image representation based on 2D Gaussians radiance.<n>It supports hardware-friendly rapid access for real-time usage, requiring only 0.3K MACs to decode a pixel.<n>We demonstrate its versatility with several applications, including texture compression, semantics-aware compression, and joint image compression and restoration.
arXiv Detail & Related papers (2024-07-02T00:45:21Z)
RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation [46.659592045271125]
RTMO is a one-stage pose estimation framework that seamlessly integrates coordinate classification. It achieves accuracy comparable to top-down methods while maintaining high speed. Our largest model, RTMO-l, attains 74.8% AP on COCO val 2017 and 141 FPS on a single V100 GPU.
arXiv Detail & Related papers (2023-12-12T18:55:29Z)
Searching a Compact Architecture for Robust Multi-Exposure Image Fusion [55.37210629454589]
Two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference. This study introduces an architecture search-based paradigm incorporating self-alignment and detail repletion modules for robust multi-exposure image fusion. The proposed method outperforms various competitive schemes, achieving a noteworthy 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios.
arXiv Detail & Related papers (2023-05-20T17:01:52Z)
Implicit Temporal Modeling with Learnable Alignment for Video Recognition [95.82093301212964]
We propose a novel Implicit Learnable Alignment (ILA) method, which minimizes the temporal modeling effort while achieving incredibly high performance. ILA achieves a top-1 accuracy of 88.7% on Kinetics-400 with much fewer FLOPs compared with Swin-L and ViViT-H.
arXiv Detail & Related papers (2023-04-20T17:11:01Z)
{\mu}Split: efficient image decomposition for microscopy data [50.794670705085835]
muSplit is a dedicated approach for trained image decomposition in the context of fluorescence microscopy images. We introduce lateral contextualization (LC), a novel meta-architecture that enables the memory efficient incorporation of large image-context. We apply muSplit to five decomposition tasks, one on a synthetic dataset, four others derived from real microscopy data.
arXiv Detail & Related papers (2022-11-23T11:26:24Z)
Inverted Semantic-Index for Image Retrieval [3.751222656656264]
inverted indices aim to build finer partitions that produce a concise and accurate candidate list. In this paper, we replace the clustering method with image classification, during the construction of codebook. We combine our semantic-index with the product quantization (PQ) so as to alleviate the accuracy loss caused by PQ compression.
arXiv Detail & Related papers (2022-06-25T11:21:56Z)
Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths. In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
DynaMiTe: A Dynamic Local Motion Model with Temporal Constraints for Robust Real-Time Feature Matching [47.72468932196169]
We present the lightweight pipeline DynaMiTe, which is agnostic to the descriptor input and leverages spatial-temporal cues with efficient statistical measures. DynaMiTe achieves superior results both in terms of matching accuracy and camera pose estimation with high frame rates, outperforming state-of-the-art matching methods.
arXiv Detail & Related papers (2020-07-31T12:18:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.