Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred
Thousand-Scale One-Shot Logo Identification
- URL: http://arxiv.org/abs/2211.12926v1
- Date: Wed, 23 Nov 2022 12:59:41 GMT
- Title: Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred
Thousand-Scale One-Shot Logo Identification
- Authors: Nakul Sharma, Abhirama S. Penamakuri, Anand Mishra
- Abstract summary: We study the problem of identifying logos of business brands in natural scenes in an open-set one-shot setting.
We propose a novel multi-view textual-visual encoding framework that encodes text appearing in the logos.
We evaluate our proposed framework for cropped logo verification, cropped logo identification, and end-to-end logo identification in natural scene tasks.
- Score: 2.243832625209014
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we study the problem of identifying logos of business brands
in natural scenes in an open-set one-shot setting. This problem setup is
significantly more challenging than traditionally-studied 'closed-set' and
'large-scale training samples per category' logo recognition settings. We
propose a novel multi-view textual-visual encoding framework that encodes text
appearing in the logos as well as the graphical design of the logos to learn
robust contrastive representations. These representations are jointly learned
for multiple views of logos over a batch and thereby they generalize well to
unseen logos. We evaluate our proposed framework for cropped logo verification,
cropped logo identification, and end-to-end logo identification in natural
scene tasks; and compare it against state-of-the-art methods. Further, the
literature lacks a 'very-large-scale' collection of reference logo images that
can facilitate the study of one-hundred thousand-scale logo identification. To
fill this gap in the literature, we introduce Wikidata Reference Logo Dataset
(WiRLD), containing logos for 100K business brands harvested from Wikidata. Our
proposed framework that achieves an area under the ROC curve of 91.3% on the
QMUL-OpenLogo dataset for the verification task, outperforms state-of-the-art
methods by 9.1% and 2.6% on the one-shot logo identification task on the
Toplogos-10 and the FlickrLogos32 datasets, respectively. Further, we show that
our method is more stable compared to other baselines even when the number of
candidate logos is on a 100K scale.
Related papers
- LogoSticker: Inserting Logos into Diffusion Models for Customized Generation [73.59571559978278]
We introduce the task of logo insertion into text-to-image models.
Our goal is to insert logo identities into diffusion models and enable their seamless synthesis in varied contexts.
We present a novel two-phase pipeline LogoSticker to tackle this task.
arXiv Detail & Related papers (2024-07-18T17:54:49Z) - SLANT: Spurious Logo ANalysis Toolkit [61.59021920232986]
We develop SLANT: A Spurious Logo ANalysis Toolkit.
It contains a semi-automatic mechanism for mining such "spurious" logos.
We uncover various seemingly harmless logos that VL models correlate with negative human adjectives.
An attacker could place a spurious logo on harmful content, causing the model to misclassify it as harmless.
arXiv Detail & Related papers (2024-06-03T15:41:31Z) - Image-Text Pre-Training for Logo Recognition [0.27195102129094995]
We propose two novel contributions to improve the matching model's performance.
A standard paradigm of fine-tuning ImageNet pre-trained models fails to discover the text sensitivity necessary to solve the matching problem effectively.
We show that the same vision backbone pre-trained on image-text data, when fine-tuned on OpenLogoDet3K47, achieves $98.6%$ recall@1.
arXiv Detail & Related papers (2023-09-18T23:18:02Z) - Deep Learning for Logo Detection: A Survey [59.278443852492465]
This paper reviews the advance in applying deep learning techniques to logo detection.
We perform an in-depth analysis of the existing logo detection strategies and the strengths and weaknesses of each learning strategy.
We summarize the applications of logo detection in various fields, from intelligent transportation and brand monitoring to copyright and trademark compliance.
arXiv Detail & Related papers (2022-10-10T02:07:41Z) - Makeup216: Logo Recognition with Adversarial Attention Representations [16.78131635640705]
Makeup216 is the largest and most complex logo dataset in the field of makeup, captured from the real world.
It comprises of 216 logos and 157 brands, including 10,019 images and 37,018 annotated logo objects.
Our proposed framework achieved competitive results on Makeup216 and another large-scale open logo dataset.
arXiv Detail & Related papers (2021-12-13T10:08:56Z) - Discriminative Semantic Feature Pyramid Network with Guided Anchoring
for Logo Detection [52.36825190893928]
We propose a novel approach, named Discriminative Semantic Feature Pyramid Network with Guided Anchoring (DSFP-GA)
Our approach mainly consists of Discriminative Semantic Feature Pyramid (DSFP) and Guided Anchoring (GA)
arXiv Detail & Related papers (2021-08-31T11:59:00Z) - FoodLogoDet-1500: A Dataset for Large-Scale Food Logo Detection via
Multi-Scale Feature Decoupling Network [55.49022825759331]
A large-scale food logo dataset is urgently needed for developing advanced food logo detection algorithms.
FoodLogoDet-1500 is a new large-scale publicly available food logo dataset with 1,500 categories, about 100,000 images and about 150,000 manually annotated food logo objects.
We propose a novel food logo detection method Multi-scale Feature Decoupling Network (MFDNet) to solve the problem of distinguishing multiple food logo categories.
arXiv Detail & Related papers (2021-08-10T12:47:04Z) - Famous Companies Use More Letters in Logo:A Large-Scale Analysis of Text
Area in Logo [4.168157981135698]
We focus on three correlations between logo images and their text areas, between the text areas and the number of followers on Twitter, and between the logo images and the number of followers.
Various findings include the weak positive correlation between the text area ratio and the number of followers of the company.
arXiv Detail & Related papers (2021-04-01T08:19:29Z) - LogoDet-3K: A Large-Scale Image Dataset for Logo Detection [61.296935298332606]
We introduce LogoDet-3K, the largest logo detection dataset with full annotation.
It has 3,000 logo categories, about 200,000 manually annotated logo objects and 158,652 images.
We propose a strong baseline method Logo-Yolo, which incorporates Focal loss and CIoU loss into the state-of-the-art YOLOv3 framework for large-scale logo detection.
arXiv Detail & Related papers (2020-08-12T14:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.