Related papers: Language-guided Image Reflection Separation

Language-guided Image Reflection Separation

URL: http://arxiv.org/abs/2402.11874v4
Date: Tue, 4 Jun 2024 06:56:43 GMT
Title: Language-guided Image Reflection Separation
Authors: Haofeng Zhong, Yuchen Hong, Shuchen Weng, Jinxiu Liang, Boxin Shi,
Abstract summary: We propose a unified framework to solve this problem. A gated network design and a randomized training strategy are employed to tackle the recognizable layer ambiguity.
Score: 48.06512741731805
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the problem of language-guided reflection separation, which aims at addressing the ill-posed reflection separation problem by introducing language descriptions to provide layer content. We propose a unified framework to solve this problem, which leverages the cross-attention mechanism with contrastive learning strategies to construct the correspondence between language descriptions and image layers. A gated network design and a randomized training strategy are employed to tackle the recognizable layer ambiguity. The effectiveness of the proposed method is validated by the significant performance advantage over existing reflection separation methods on both quantitative and qualitative comparisons.

Related papers

Visual In-Context Learning for Large Vision-Language Models [62.5507897575317]
In Large Visual Language Models (LVLMs) the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. We introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition. Our approach retrieves images via ''Retrieval & Rerank'' paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations.
arXiv Detail & Related papers (2024-02-18T12:43:38Z)
Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries. We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework. We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z)
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation [56.58365347854647]
We introduce a novel cost-based approach to adapt vision-language foundation models, notably CLIP. Our method potently adapts CLIP for segmenting seen and unseen classes by fine-tuning its encoders.
arXiv Detail & Related papers (2023-03-21T12:28:21Z)
Marginal Contrastive Correspondence for Guided Image Generation [58.0605433671196]
Exemplar-based image translation establishes dense correspondences between a conditional input and an exemplar from two different domains. Existing work builds the cross-domain correspondences implicitly by minimizing feature-wise distances across the two domains. We design a Marginal Contrastive Learning Network (MCL-Net) that explores contrastive learning to learn domain-invariant features for realistic exemplar-based image translation.
arXiv Detail & Related papers (2022-04-01T13:55:44Z)
Two-Level Supervised Contrastive Learning for Response Selection in Multi-Turn Dialogue [18.668723854662584]
This paper applies contrastive learning to the problem by using the supervised contrastive loss. We develop a new method for supervised contrastive learning, referred to as two-level supervised contrastive learning.
arXiv Detail & Related papers (2022-03-01T23:43:36Z)
Contrastive Video-Language Segmentation [41.1635597261304]
We focus on the problem of segmenting a certain object referred by a natural language sentence in video content. We propose to interwind the visual and linguistic modalities in an explicit way via the contrastive learning objective.
arXiv Detail & Related papers (2021-09-29T01:40:58Z)
Unsupervised Word Translation Pairing using Refinement based Point Set Registration [8.568050813210823]
Cross-lingual alignment of word embeddings play an important role in knowledge transfer across languages. Current unsupervised approaches rely on similarities in geometric structure of word embedding spaces across languages. This paper proposes BioSpere, a novel framework for unsupervised mapping of bi-lingual word embeddings onto a shared vector space.
arXiv Detail & Related papers (2020-11-26T09:51:29Z)
Contextual Modulation for Relation-Level Metaphor Identification [3.2619536457181075]
We introduce a novel architecture for identifying relation-level metaphoric expressions of certain grammatical relations. In a methodology inspired by works in visual reasoning, our approach is based on conditioning the neural network computation on the deep contextualised features. We demonstrate that the proposed architecture achieves state-of-the-art results on benchmark datasets.
arXiv Detail & Related papers (2020-10-12T12:07:02Z)
Learning to See Through Obstructions with Layered Decomposition [117.77024641706451]
We present a learning-based approach for removing unwanted obstructions from moving images. Our method leverages motion differences between the background and obstructing elements to recover both layers. We show that the proposed approach learned from synthetically generated data performs well to real images.
arXiv Detail & Related papers (2020-08-11T17:59:31Z)
Learning to See Through Obstructions [117.77024641706451]
We present a learning-based approach for removing unwanted obstructions from a short sequence of images captured by a moving camera. Our method leverages the motion differences between the background and the obstructing elements to recover both layers. We show that training on synthetically generated data transfers well to real images.
arXiv Detail & Related papers (2020-04-02T17:59:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.