Related papers: LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

URL: http://arxiv.org/abs/2407.13752v1
Date: Thu, 18 Jul 2024 17:54:49 GMT
Title: LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
Authors: Mingkang Zhu, Xi Chen, Zhongdao Wang, Hengshuang Zhao, Jiaya Jia,
Abstract summary: We introduce the task of logo insertion into text-to-image models. Our goal is to insert logo identities into diffusion models and enable their seamless synthesis in varied contexts. We present a novel two-phase pipeline LogoSticker to tackle this task.
Score: 73.59571559978278
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in text-to-image model customization have underscored the importance of integrating new concepts with a few examples. Yet, these progresses are largely confined to widely recognized subjects, which can be learned with relative ease through models' adequate shared prior knowledge. In contrast, logos, characterized by unique patterns and textual elements, are hard to establish shared knowledge within diffusion models, thus presenting a unique challenge. To bridge this gap, we introduce the task of logo insertion. Our goal is to insert logo identities into diffusion models and enable their seamless synthesis in varied contexts. We present a novel two-phase pipeline LogoSticker to tackle this task. First, we propose the actor-critic relation pre-training algorithm, which addresses the nontrivial gaps in models' understanding of the potential spatial positioning of logos and interactions with other objects. Second, we propose a decoupled identity learning algorithm, which enables precise localization and identity extraction of logos. LogoSticker can generate logos accurately and harmoniously in diverse contexts. We comprehensively validate the effectiveness of LogoSticker over customization methods and large models such as DALLE~3. \href{https://mingkangz.github.io/logosticker}{Project page}.

Related papers

Nested Attention: Semantic-aware Attention Values for Concept Personalization [78.90196530697897]
We introduce Nested Attention, a novel mechanism that injects a rich and expressive image representation into the model's existing cross-attention layers. Our key idea is to generate query-dependent subject values, derived from nested attention layers that learn to select relevant subject features for each region in the generated image.
arXiv Detail & Related papers (2025-01-02T18:52:11Z)
Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency. Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling. Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z)
SLANT: Spurious Logo ANalysis Toolkit [61.59021920232986]
We develop SLANT: A Spurious Logo ANalysis Toolkit. It contains a semi-automatic mechanism for mining such "spurious" logos. We uncover various seemingly harmless logos that VL models correlate with negative human adjectives. An attacker could place a spurious logo on harmful content, causing the model to misclassify it as harmless.
arXiv Detail & Related papers (2024-06-03T15:41:31Z)
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models [71.15152184631951]
We propose a fully automated solution for consistent character generation with the sole input being a text prompt. Our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods.
arXiv Detail & Related papers (2023-11-16T18:59:51Z)
FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings [26.395196542803543]
We propose an approach to prompt MLLMs to generate appropriate text for product images, which can help visual models achieve better logo embeddings. Our experiments on real-world datasets prove that FashionLOGO is capable of generating generic and robust logo embeddings.
arXiv Detail & Related papers (2023-08-17T14:30:26Z)
A Cross-direction Task Decoupling Network for Small Logo Detection [28.505952002735334]
We creatively propose Cross-direction Task Decoupling Network (CTDNet) for small logo detection. Comprehensive experiments on four logo datasets demonstrate the effectiveness and efficiency of the proposed method.
arXiv Detail & Related papers (2023-05-04T02:23:34Z)
Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification [2.243832625209014]
We study the problem of identifying logos of business brands in natural scenes in an open-set one-shot setting. We propose a novel multi-view textual-visual encoding framework that encodes text appearing in the logos. We evaluate our proposed framework for cropped logo verification, cropped logo identification, and end-to-end logo identification in natural scene tasks.
arXiv Detail & Related papers (2022-11-23T12:59:41Z)
Deep Learning for Logo Detection: A Survey [59.278443852492465]
This paper reviews the advance in applying deep learning techniques to logo detection. We perform an in-depth analysis of the existing logo detection strategies and the strengths and weaknesses of each learning strategy. We summarize the applications of logo detection in various fields, from intelligent transportation and brand monitoring to copyright and trademark compliance.
arXiv Detail & Related papers (2022-10-10T02:07:41Z)
Makeup216: Logo Recognition with Adversarial Attention Representations [16.78131635640705]
Makeup216 is the largest and most complex logo dataset in the field of makeup, captured from the real world. It comprises of 216 logos and 157 brands, including 10,019 images and 37,018 annotated logo objects. Our proposed framework achieved competitive results on Makeup216 and another large-scale open logo dataset.
arXiv Detail & Related papers (2021-12-13T10:08:56Z)
Discriminative Semantic Feature Pyramid Network with Guided Anchoring for Logo Detection [52.36825190893928]
We propose a novel approach, named Discriminative Semantic Feature Pyramid Network with Guided Anchoring (DSFP-GA) Our approach mainly consists of Discriminative Semantic Feature Pyramid (DSFP) and Guided Anchoring (GA)
arXiv Detail & Related papers (2021-08-31T11:59:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.