Related papers: Coarse-to-Fine Non-rigid Multi-modal Image Registration for Historical Panel Paintings based on Crack Structures

Coarse-to-Fine Non-rigid Multi-modal Image Registration for Historical Panel Paintings based on Crack Structures

URL: http://arxiv.org/abs/2601.16348v1
Date: Thu, 22 Jan 2026 22:19:02 GMT
Title: Coarse-to-Fine Non-rigid Multi-modal Image Registration for Historical Panel Paintings based on Crack Structures
Authors: Aline Sindel, Andreas Maier, Vincent Christlein,
Abstract summary: We propose a coarse-to-fine non-rigid multi-modal registration method for historical panel paintings.<n>We employ a convolutional neural network for joint keypoint detection and description based on the craquelure.<n>For coarse-to-fine registration, we introduce a novel multi-level keypoint refinement approach to register mixed-resolution images.
Score: 6.9973522698523745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Art technological investigations of historical panel paintings rely on acquiring multi-modal image data, including visual light photography, infrared reflectography, ultraviolet fluorescence photography, x-radiography, and macro photography. For a comprehensive analysis, the multi-modal images require pixel-wise alignment, which is still often performed manually. Multi-modal image registration can reduce this laborious manual work, is substantially faster, and enables higher precision. Due to varying image resolutions, huge image sizes, non-rigid distortions, and modality-dependent image content, registration is challenging. Therefore, we propose a coarse-to-fine non-rigid multi-modal registration method efficiently relying on sparse keypoints and thin-plate-splines. Historical paintings exhibit a fine crack pattern, called craquelure, on the paint layer, which is captured by all image systems and is well-suited as a feature for registration. In our one-stage non-rigid registration approach, we employ a convolutional neural network for joint keypoint detection and description based on the craquelure and a graph neural network for descriptor matching in a patch-based manner, and filter matches based on homography reprojection errors in local areas. For coarse-to-fine registration, we introduce a novel multi-level keypoint refinement approach to register mixed-resolution images up to the highest resolution. We created a multi-modal dataset of panel paintings with a high number of keypoint annotations, and a large test set comprising five multi-modal domains and varying image resolutions. The ablation study demonstrates the effectiveness of all modules of our refinement method. Our proposed approaches achieve the best registration results compared to competing keypoint and dense matching methods and refinement methods.

Related papers

MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training [62.843316348659165]
Deep learning-based image matching algorithms have dramatically outperformed humans in rapidly and accurately finding large amounts of correspondences.<n>We propose a large-scale pre-training framework that utilizes synthetic cross-modal training signals to train models to recognize and match fundamental structures across images.<n>Our key finding is that the matching model trained with our framework achieves remarkable generalizability across more than eight unseen cross-modality registration tasks.
arXiv Detail & Related papers (2025-01-13T18:37:36Z)
3D Multimodal Image Registration for Plant Phenotyping [0.6697966247860049]
The use of multiple camera technologies in a combined multimodal monitoring system for plant phenotyping offers promising benefits. The effective utilization of cross-modal patterns is dependent on precise image registration to achieve pixel-accurate alignment. We propose a novel multimodal 3D image registration method that addresses these challenges by integrating depth information from a time-of-flight camera into the registration process.
arXiv Detail & Related papers (2024-07-03T09:29:46Z)
Breaking Modality Disparity: Harmonized Representation for Infrared and Visible Image Registration [66.33746403815283]
We propose a scene-adaptive infrared and visible image registration. We employ homography to simulate the deformation between different planes. We propose the first ground truth available misaligned infrared and visible image dataset.
arXiv Detail & Related papers (2023-04-12T06:49:56Z)
Multi-modal Retinal Image Registration Using a Keypoint-Based Vessel Structure Aligning Network [9.988115865060589]
We propose an end-to-end trainable deep learning method for multi-modal retinal image registration. Our method extracts convolutional features from the vessel structure for keypoint detection and description. The keypoint detection and description network and graph neural network are jointly trained in a self-supervised manner.
arXiv Detail & Related papers (2022-07-21T14:36:51Z)
Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network. We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z)
A Keypoint Detection and Description Network Based on the Vessel Structure for Multi-Modal Retinal Image Registration [0.0]
Multiple images with different modalities or acquisition times are often analyzed for the diagnosis of retinal diseases. Our method uses a convolutional neural network to extract features of the vessel structure in multi-modal retinal images.
arXiv Detail & Related papers (2022-01-06T20:43:35Z)
DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras. Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z)
CoMIR: Contrastive Multimodal Image Representation for Registration [4.543268895439618]
We propose contrastive coding to learn shared, dense image representations, referred to as CoMIRs (Contrastive Multimodal Image Representations) CoMIRs enable the registration of multimodal images where existing registration methods often fail due to a lack of sufficiently similar image structures.
arXiv Detail & Related papers (2020-06-11T10:51:33Z)
Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation. We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task. We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
Multimodal Deep Unfolding for Guided Image Super-Resolution [23.48305854574444]
Deep learning methods rely on training data to learn an end-to-end mapping from a low-resolution input to a high-resolution output. We propose a multimodal deep learning design that incorporates sparse priors and allows the effective integration of information from another image modality into the network architecture. Our solution relies on a novel deep unfolding operator, performing steps similar to an iterative algorithm for convolutional sparse coding with side information.
arXiv Detail & Related papers (2020-01-21T14:41:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.