Image-Text Pre-Training for Logo Recognition
- URL: http://arxiv.org/abs/2309.10206v1
- Date: Mon, 18 Sep 2023 23:18:02 GMT
- Title: Image-Text Pre-Training for Logo Recognition
- Authors: Mark Hubenthal, Suren Kumar
- Abstract summary: We propose two novel contributions to improve the matching model's performance.
A standard paradigm of fine-tuning ImageNet pre-trained models fails to discover the text sensitivity necessary to solve the matching problem effectively.
We show that the same vision backbone pre-trained on image-text data, when fine-tuned on OpenLogoDet3K47, achieves $98.6%$ recall@1.
- Score: 0.27195102129094995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-set logo recognition is commonly solved by first detecting possible logo
regions and then matching the detected parts against an ever-evolving dataset
of cropped logo images. The matching model, a metric learning problem, is
especially challenging for logo recognition due to the mixture of text and
symbols in logos. We propose two novel contributions to improve the matching
model's performance: (a) using image-text paired samples for pre-training, and
(b) an improved metric learning loss function. A standard paradigm of
fine-tuning ImageNet pre-trained models fails to discover the text sensitivity
necessary to solve the matching problem effectively. This work demonstrates the
importance of pre-training on image-text pairs, which significantly improves
the performance of a visual embedder trained for the logo retrieval task,
especially for more text-dominant classes. We construct a composite public logo
dataset combining LogoDet3K, OpenLogo, and FlickrLogos-47 deemed
OpenLogoDet3K47. We show that the same vision backbone pre-trained on
image-text data, when fine-tuned on OpenLogoDet3K47, achieves $98.6\%$
recall@1, significantly improving performance over pre-training on Imagenet1K
($97.6\%$). We generalize the ProxyNCA++ loss function to propose ProxyNCAHN++
which incorporates class-specific hard negative images. The proposed method
sets new state-of-the-art on five public logo datasets considered, with a
$3.5\%$ zero-shot recall@1 improvement on LogoDet3K test, $4\%$ on OpenLogo,
$6.5\%$ on FlickrLogos-47, $6.2\%$ on Logos In The Wild, and $0.6\%$ on
BelgaLogo.
Related papers
- LogoSticker: Inserting Logos into Diffusion Models for Customized Generation [73.59571559978278]
We introduce the task of logo insertion into text-to-image models.
Our goal is to insert logo identities into diffusion models and enable their seamless synthesis in varied contexts.
We present a novel two-phase pipeline LogoSticker to tackle this task.
arXiv Detail & Related papers (2024-07-18T17:54:49Z) - RL-LOGO: Deep Reinforcement Learning Localization for Logo Recognition [0.0]
This paper proposes a novel logo image recognition approach incorporating a localization technique based on reinforcement learning.
Because there is no annotation for the position coordinates, it is impossible to train and infer the location of the logo in the image.
We demonstrate that the proposed method is a promising approach to logo recognition in real-world applications.
arXiv Detail & Related papers (2023-12-28T02:44:28Z) - Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training [33.51524424536508]
Iterative Prompt Relabeling (IPR) is a novel algorithm that aligns images to text through iterative image sampling and prompt relabeling with feedback.
We conduct thorough experiments on SDv2 and SDXL, testing their capability to follow instructions on spatial relations.
arXiv Detail & Related papers (2023-12-23T11:10:43Z) - Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred
Thousand-Scale One-Shot Logo Identification [2.243832625209014]
We study the problem of identifying logos of business brands in natural scenes in an open-set one-shot setting.
We propose a novel multi-view textual-visual encoding framework that encodes text appearing in the logos.
We evaluate our proposed framework for cropped logo verification, cropped logo identification, and end-to-end logo identification in natural scene tasks.
arXiv Detail & Related papers (2022-11-23T12:59:41Z) - Unpaired Image Captioning by Image-level Weakly-Supervised Visual
Concept Recognition [83.93422034664184]
Unpaired image captioning (UIC) is to describe images without using image-caption pairs in the training phase.
Most existing studies use off-the-shelf algorithms to obtain the visual concepts.
We propose a novel approach to achieve cost-effective UIC using image-level labels.
arXiv Detail & Related papers (2022-03-07T08:02:23Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - An Effective and Robust Detector for Logo Detection [58.448716977297565]
Some attackers fool the well-trained logo detection model for infringement.
A novel logo detector based on the mechanism of looking and thinking twice is proposed in this paper.
We extend detectoRS algorithm to a cascade schema with an equalization loss function, multi-scale transformations, and adversarial data augmentation.
arXiv Detail & Related papers (2021-08-01T10:17:53Z) - Deep learning based registration using spatial gradients and noisy
segmentation labels [52.78503776563559]
deep learning based approaches became quite popular, providing fast and performing registration strategies.
Our work relies on (i) a symmetric formulation, predicting the transformations from source to target and from target to source simultaneously, enforcing the trained representations to be similar.
Our method reports a mean dice of $0.64$ for task 3 and $0.85$ for task 4 on the test sets, taking third place on the challenge.
arXiv Detail & Related papers (2020-10-21T11:08:45Z) - LogoDet-3K: A Large-Scale Image Dataset for Logo Detection [61.296935298332606]
We introduce LogoDet-3K, the largest logo detection dataset with full annotation.
It has 3,000 logo categories, about 200,000 manually annotated logo objects and 158,652 images.
We propose a strong baseline method Logo-Yolo, which incorporates Focal loss and CIoU loss into the state-of-the-art YOLOv3 framework for large-scale logo detection.
arXiv Detail & Related papers (2020-08-12T14:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.