DIAR: Deep Image Alignment and Reconstruction using Swin Transformers
- URL: http://arxiv.org/abs/2310.11605v1
- Date: Tue, 17 Oct 2023 21:59:45 GMT
- Title: DIAR: Deep Image Alignment and Reconstruction using Swin Transformers
- Authors: Monika Kwiatkowski, Simon Matern, Olaf Hellwich
- Abstract summary: We create a dataset that contains images with image distortions.
We create perspective distortions with corresponding ground-truth homographies as labels.
We use our dataset to train Swin transformer models to analyze sequential image data.
- Score: 3.1000291317724993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When taking images of some occluded content, one is often faced with the
problem that every individual image frame contains unwanted artifacts, but a
collection of images contains all relevant information if properly aligned and
aggregated. In this paper, we attempt to build a deep learning pipeline that
simultaneously aligns a sequence of distorted images and reconstructs them. We
create a dataset that contains images with image distortions, such as lighting,
specularities, shadows, and occlusion. We create perspective distortions with
corresponding ground-truth homographies as labels. We use our dataset to train
Swin transformer models to analyze sequential image data. The attention maps
enable the model to detect relevant image content and differentiate it from
outliers and artifacts. We further explore using neural feature maps as
alternatives to classical key point detectors. The feature maps of trained
convolutional layers provide dense image descriptors that can be used to find
point correspondences between images. We utilize this to compute coarse image
alignments and explore its limitations.
Related papers
- ConDL: Detector-Free Dense Image Matching [2.7582789611575897]
We introduce a deep-learning framework designed for estimating dense image correspondences.
Our fully convolutional model generates dense feature maps for images, where each pixel is associated with a descriptor that can be matched across multiple images.
arXiv Detail & Related papers (2024-08-05T18:34:15Z) - Perceptual Artifacts Localization for Image Synthesis Tasks [59.638307505334076]
We introduce a novel dataset comprising 10,168 generated images, each annotated with per-pixel perceptual artifact labels.
A segmentation model, trained on our proposed dataset, effectively localizes artifacts across a range of tasks.
We propose an innovative zoom-in inpainting pipeline that seamlessly rectifies perceptual artifacts in the generated images.
arXiv Detail & Related papers (2023-10-09T10:22:08Z) - SIDAR: Synthetic Image Dataset for Alignment & Restoration [2.9649783577150837]
There is a lack of datasets that provide enough data to train and evaluate end-to-end deep learning models.
Our proposed data augmentation helps to overcome the issue of data scarcity by using 3D rendering.
The resulting dataset can serve as a training and evaluation set for a multitude of tasks involving image alignment and artifact removal.
arXiv Detail & Related papers (2023-05-19T23:32:06Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Compressive Sensing with Tensorized Autoencoder [22.89029876274012]
In many cases, different images in a collection are articulated versions of one another.
In this paper, our goal is to recover images without access to the ground-truth (clean) images using the articulations as structural prior to the data.
We propose to learn autoencoder with tensor ring factorization on the the embedding space to impose structural constraints on the data.
arXiv Detail & Related papers (2023-03-10T22:59:09Z) - Compositional Sketch Search [91.84489055347585]
We present an algorithm for searching image collections using free-hand sketches.
We exploit drawings as a concise and intuitive representation for specifying entire scene compositions.
arXiv Detail & Related papers (2021-06-15T09:38:09Z) - Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose.
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z) - TransFill: Reference-guided Image Inpainting by Merging Multiple Color
and Spatial Transformations [35.9576572490994]
We propose TransFill, a multi-homography transformed fusion method to fill the hole by referring to another source image that shares scene contents with the target image.
We learn to adjust the color and apply a pixel-level warping to each homography-warped source image to make it more consistent with the target.
Our method achieves state-of-the-art performance on pairs of images across a variety of wide baselines and color differences, and generalizes to user-provided image pairs.
arXiv Detail & Related papers (2021-03-29T22:45:07Z) - Data Augmentation for Object Detection via Differentiable Neural
Rendering [71.00447761415388]
It is challenging to train a robust object detector when annotated data is scarce.
Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data.
We introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
arXiv Detail & Related papers (2021-03-04T06:31:06Z) - Structural-analogy from a Single Image Pair [118.61885732829117]
In this paper, we explore the capabilities of neural networks to understand image structure given only a single pair of images, A and B.
We generate an image that keeps the appearance and style of B, but has a structural arrangement that corresponds to A.
Our method can be used to generate high quality imagery in other conditional generation tasks utilizing images A and B only.
arXiv Detail & Related papers (2020-04-05T14:51:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.