Light Field Saliency Detection with Dual Local Graph Learning
andReciprocative Guidance
- URL: http://arxiv.org/abs/2110.00698v1
- Date: Sat, 2 Oct 2021 00:54:39 GMT
- Title: Light Field Saliency Detection with Dual Local Graph Learning
andReciprocative Guidance
- Authors: Nian Liu, Wangbo Zhao, Dingwen Zhang, Junwei Han, Ling Shao
- Abstract summary: We model the infor-mation fusion within focal stack via graph networks.
We build a novel dual graph modelto guide the focal stack fusion process using all-focus pat-terns.
- Score: 148.9832328803202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The application of light field data in salient object de-tection is becoming
increasingly popular recently. The diffi-culty lies in how to effectively fuse
the features within the fo-cal stack and how to cooperate them with the feature
of theall-focus image. Previous methods usually fuse focal stackfeatures via
convolution or ConvLSTM, which are both lesseffective and ill-posed. In this
paper, we model the infor-mation fusion within focal stack via graph networks.
Theyintroduce powerful context propagation from neighbouringnodes and also
avoid ill-posed implementations. On the onehand, we construct local graph
connections thus avoidingprohibitive computational costs of traditional graph
net-works. On the other hand, instead of processing the twokinds of data
separately, we build a novel dual graph modelto guide the focal stack fusion
process using all-focus pat-terns. To handle the second difficulty, previous
methods usu-ally implement one-shot fusion for focal stack and
all-focusfeatures, hence lacking a thorough exploration of their sup-plements.
We introduce a reciprocative guidance schemeand enable mutual guidance between
these two kinds of in-formation at multiple steps. As such, both kinds of
featurescan be enhanced iteratively, finally benefiting the saliencyprediction.
Extensive experimental results show that theproposed models are all beneficial
and we achieve signif-icantly better results than state-of-the-art methods.
Related papers
- DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion [35.60459492849359]
We study the problem of generating intermediate images from image pairs with large motion.
Due to the large motion, the intermediate semantic information may be absent in input images.
We propose DreamMover, a novel image framework with three main components.
arXiv Detail & Related papers (2024-09-15T04:09:12Z) - Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery [17.455841673719625]
Unsupervised landmarks discovery (ULD) for an object category is a challenging computer vision problem.
In pursuit of developing a robust ULD framework, we explore the potential of a recent paradigm of self-supervised learning algorithms, known as diffusion models.
Our approach consistently outperforms state-of-the-art methods on four challenging benchmarks AFLW, MAFL, CatHeads and LS3D by significant margins.
arXiv Detail & Related papers (2024-03-24T15:24:04Z) - From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images.
Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z) - Trading-off Mutual Information on Feature Aggregation for Face
Recognition [12.803514943105657]
We propose a technique to aggregate the outputs of two state-of-the-art (SOTA) deep Face Recognition (FR) models.
In our approach, we leverage the transformer attention mechanism to exploit the relationship between different parts of two feature maps.
To evaluate the effectiveness of our proposed method, we conducted experiments on popular benchmarks and compared our results with state-of-the-art algorithms.
arXiv Detail & Related papers (2023-09-22T18:48:38Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - Grounded Text-to-Image Synthesis with Attention Refocusing [16.9170825951175]
We reveal the potential causes in the diffusion model's cross-attention and self-attention layers.
We propose two novel losses to refocus attention maps according to a given spatial layout during sampling.
We show that our proposed attention refocusing effectively improves the controllability of existing approaches.
arXiv Detail & Related papers (2023-06-08T17:59:59Z) - Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and
Beyond [50.556961575275345]
We build an image fusion module to fuse complementary characteristics and cascade dual task-related modules.
We develop an efficient first-order approximation to compute corresponding gradients and present dynamic weighted aggregation to balance the gradients for fusion learning.
arXiv Detail & Related papers (2023-05-11T10:55:34Z) - Learning to Agree on Vision Attention for Visual Commonsense Reasoning [50.904275811951614]
A VCR model aims at answering a question regarding an image, followed by the rationale prediction for the preceding answering process.
Existing methods ignore the pivotal relationship between the two processes, leading to sub-optimal model performance.
This paper presents a novel visual attention alignment method to efficaciously handle these two processes in a unified framework.
arXiv Detail & Related papers (2023-02-04T07:02:29Z) - Single Stage Virtual Try-on via Deformable Attention Flows [51.70606454288168]
Virtual try-on aims to generate a photo-realistic fitting result given an in-shop garment and a reference person image.
We develop a novel Deformable Attention Flow (DAFlow) which applies the deformable attention scheme to multi-flow estimation.
Our proposed method achieves state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-07-19T10:01:31Z) - Unsupervised Image Fusion Method based on Feature Mutual Mapping [16.64607158983448]
We propose an unsupervised adaptive image fusion method to address the above issues.
We construct a global map to measure the connections of pixels between the input source images.
Our method achieves superior performance in both visual perception and objective evaluation.
arXiv Detail & Related papers (2022-01-25T07:50:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.