Related papers: DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion

DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion

URL: http://arxiv.org/abs/2409.09605v2
Date: Wed, 18 Sep 2024 06:34:47 GMT
Title: DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion
Authors: Liao Shen, Tianqi Liu, Huiqiang Sun, Xinyi Ye, Baopu Li, Jianming Zhang, Zhiguo Cao,
Abstract summary: We study the problem of generating intermediate images from image pairs with large motion. Due to the large motion, the intermediate semantic information may be absent in input images. We propose DreamMover, a novel image framework with three main components.
Score: 35.60459492849359
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of generating intermediate images from image pairs with large motion while maintaining semantic consistency. Due to the large motion, the intermediate semantic information may be absent in input images. Existing methods either limit to small motion or focus on topologically similar objects, leading to artifacts and inconsistency in the interpolation results. To overcome this challenge, we delve into pre-trained image diffusion models for their capabilities in semantic cognition and representations, ensuring consistent expression of the absent intermediate semantic representations with the input. To this end, we propose DreamMover, a novel image interpolation framework with three main components: 1) A natural flow estimator based on the diffusion model that can implicitly reason about the semantic correspondence between two images. 2) To avoid the loss of detailed information during fusion, our key insight is to fuse information in two parts, high-level space and low-level space. 3) To enhance the consistency between the generated images and input, we propose the self-attention concatenation and replacement approach. Lastly, we present a challenging benchmark dataset InterpBench to evaluate the semantic consistency of generated results. Extensive experiments demonstrate the effectiveness of our method. Our project is available at https://dreamm0ver.github.io .

Related papers

Mind the Gap: Aligning Vision Foundation Models to Image Feature Matching [31.42132290162457]
We introduce a new framework called IMD (Image feature Matching with a pre-trained Diffusion model) with two parts.<n>Unlike the dominant solutions employing contrastive-learning based foundation models that emphasize global semantics, we integrate the generative-based diffusion models.<n>Our proposed IMD establishes a new state-of-the-art in commonly evaluated benchmarks, and the superior 12% improvement in IMIM indicates our method efficiently mitigates the misalignment.
arXiv Detail & Related papers (2025-07-14T14:28:15Z)
Causality-Driven Infrared and Visible Image Fusion [7.454657847653563]
This paper re-examines the image fusion task from the causality perspective.<n>It disentangles the model from the impact of bias by constructing a tailored causal graph.<n>Back-door Adjustment based Feature Fusion Module (BAFFM) is proposed to eliminate confounder interference.
arXiv Detail & Related papers (2025-05-27T07:48:52Z)
Generating Fine Details of Entity Interactions [17.130839907951877]
This paper introduces InterActing, an interaction-focused dataset with 1000 fine-grained prompts covering three key scenarios. We propose a decomposition-augmented refinement procedure to address interaction generation challenges. Our approach, DetailScribe, uses a VLM to critique generated images, and applies targeted interventions within the diffusion process in refinement.
arXiv Detail & Related papers (2025-04-11T17:24:58Z)
Consistent Human Image and Video Generation with Spatially Conditioned Diffusion [82.4097906779699]
Consistent human-centric image and video synthesis aims to generate images with new poses while preserving appearance consistency with a given reference image. We frame the task as a spatially-conditioned inpainting problem, where the target image is in-painted to maintain appearance consistency with the reference. This approach enables the reference features to guide the generation of pose-compliant targets within a unified denoising network.
arXiv Detail & Related papers (2024-12-19T05:02:30Z)
Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models [65.82564074712836]
We introduce DIFfusionHOI, a new HOI detector shedding light on text-to-image diffusion models. We first devise an inversion-based strategy to learn the expression of relation patterns between humans and objects in embedding space. These learned relation embeddings then serve as textual prompts, to steer diffusion models generate images that depict specific interactions.
arXiv Detail & Related papers (2024-10-26T12:00:33Z)
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing [28.593023489682654]
We present DiffMorpher, the first approach enabling smooth and natural image morphing using diffusion models. Our key idea is to capture the semantics of the two images by fitting two LoRAs to them respectively, and interpolate between both the LoRA parameters and the latent noises to ensure a smooth semantic transition. In addition, we propose an attention and injection technique and a new sampling schedule to further enhance the smoothness between consecutive images.
arXiv Detail & Related papers (2023-12-12T16:28:08Z)
Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images. We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process. Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z)
FreeReg: Image-to-Point Cloud Registration Leveraging Pretrained Diffusion Models and Monocular Depth Estimators [37.39693977657165]
Matching cross-modality features between images and point clouds is a fundamental problem for image-to-point cloud registration. We propose to unify the modality between images and point clouds by pretrained large-scale models first. We show that the intermediate features, called diffusion features, extracted by depth-to-image diffusion models are semantically consistent between images and point clouds.
arXiv Detail & Related papers (2023-10-05T09:57:23Z)
Mutual Information-driven Triple Interaction Network for Efficient Image Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing. The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal. The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z)
Edge Guided GANs with Multi-Scale Contrastive Learning for Semantic Image Synthesis [139.2216271759332]
We propose a novel ECGAN for the challenging semantic image synthesis task. The semantic labels do not provide detailed structural information, making it challenging to synthesize local details and structures. The widely adopted CNN operations such as convolution, down-sampling, and normalization usually cause spatial resolution loss. We propose a novel contrastive learning method, which aims to enforce pixel embeddings belonging to the same semantic class to generate more similar image content.
arXiv Detail & Related papers (2023-07-22T14:17:19Z)
Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching. We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z)
Cross-modal Image Retrieval with Deep Mutual Information Maximization [14.778158582349137]
We study the cross-modal image retrieval, where the inputs contain a source image plus some text that describes certain modifications to this image and the desired image. Our method narrows the modality gap between the text modality and the image modality by maximizing mutual information between their not exactly semantically identical representation.
arXiv Detail & Related papers (2021-03-10T13:08:09Z)
Rethinking of the Image Salient Object Detection: Object-level Semantic Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter [62.26677215668959]
We propose a lightweight, weakly supervised deep network to coarsely locate semantically salient regions. We then fuse multiple off-the-shelf deep models on these semantically salient regions as the pixel-wise saliency refinement. Our method is simple yet effective, which is the first attempt to consider the salient object detection mainly as an object-level semantic re-ranking problem.
arXiv Detail & Related papers (2020-08-10T07:12:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.