Logo-VGR: Visual Grounded Reasoning for Open-world Logo Recognition
- URL: http://arxiv.org/abs/2509.25811v1
- Date: Tue, 30 Sep 2025 05:35:10 GMT
- Title: Logo-VGR: Visual Grounded Reasoning for Open-world Logo Recognition
- Authors: Zichen Liang, Jingjing Fei, Jie Wang, Zheming Yang, Changqing Li, Pei Wu, Minghui Qiu, Fei Yang, Xialei Liu,
- Abstract summary: We introduce an open-world logo recognition benchmark, a core challenge in product moderation.<n>Unlike traditional logo recognition methods that rely on memorizing representations of tens of thousands of brands, we propose Logo-VGR.<n>We show that Logo-VGR outperforms strong baselines by nearly 10 points in OOD settings.
- Score: 25.658499211854153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in multimodal large language models (MLLMs) have been primarily evaluated on general-purpose benchmarks, while their applications in domain-specific scenarios, such as intelligent product moderation, remain underexplored. To address this gap, we introduce an open-world logo recognition benchmark, a core challenge in product moderation. Unlike traditional logo recognition methods that rely on memorizing representations of tens of thousands of brands-an impractical approach in real-world settings-our proposed method, Logo-VGR, enables generalization to large-scale brand recognition with supervision from only a small subset of brands. Specifically, we reformulate logo recognition as a comparison-based task, requiring the model to match product images with candidate logos rather than directly generating brand labels. We further observe that existing models tend to overfit by memorizing brand distributions instead of learning robust multimodal reasoning, which results in poor performance on unseen brands. To overcome this limitation, Logo-VGR introduces a new paradigm of domain-specific multimodal reasoning: Logo Perception Grounding injects domain knowledge, and Logo-Guided Visual Grounded Reasoning enhances the model's reasoning capability. Experimental results show that Logo-VGR outperforms strong baselines by nearly 10 points in OOD settings, demonstrating superior generalization.
Related papers
- From Unlearning to UNBRANDING: A Benchmark for Trademark-Safe Text-to-Image Generation [0.7798283447125206]
Brand recognition is multi-dimensional, extending beyond explicit logos to encompass distinctive structural features.<n>We introduce unbranding, a novel task for the fine-grained removal of both trademarks and subtle structural brand features.<n>Our results, validated by our Vision Language Models metric, confirm unbranding is a distinct, practically relevant problem.
arXiv Detail & Related papers (2025-12-15T23:15:36Z) - LogoSticker: Inserting Logos into Diffusion Models for Customized Generation [73.59571559978278]
We introduce the task of logo insertion into text-to-image models.
Our goal is to insert logo identities into diffusion models and enable their seamless synthesis in varied contexts.
We present a novel two-phase pipeline LogoSticker to tackle this task.
arXiv Detail & Related papers (2024-07-18T17:54:49Z) - SLANT: Spurious Logo ANalysis Toolkit [61.59021920232986]
We develop SLANT: A Spurious Logo ANalysis Toolkit.
It contains a semi-automatic mechanism for mining such "spurious" logos.
We uncover various seemingly harmless logos that VL models correlate with negative human adjectives.
An attacker could place a spurious logo on harmful content, causing the model to misclassify it as harmless.
arXiv Detail & Related papers (2024-06-03T15:41:31Z) - FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings [26.395196542803543]
We propose an approach to prompt MLLMs to generate appropriate text for product images, which can help visual models achieve better logo embeddings.
Our experiments on real-world datasets prove that FashionLOGO is capable of generating generic and robust logo embeddings.
arXiv Detail & Related papers (2023-08-17T14:30:26Z) - Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred
Thousand-Scale One-Shot Logo Identification [2.243832625209014]
We study the problem of identifying logos of business brands in natural scenes in an open-set one-shot setting.
We propose a novel multi-view textual-visual encoding framework that encodes text appearing in the logos.
We evaluate our proposed framework for cropped logo verification, cropped logo identification, and end-to-end logo identification in natural scene tasks.
arXiv Detail & Related papers (2022-11-23T12:59:41Z) - Deep Learning for Logo Detection: A Survey [59.278443852492465]
This paper reviews the advance in applying deep learning techniques to logo detection.
We perform an in-depth analysis of the existing logo detection strategies and the strengths and weaknesses of each learning strategy.
We summarize the applications of logo detection in various fields, from intelligent transportation and brand monitoring to copyright and trademark compliance.
arXiv Detail & Related papers (2022-10-10T02:07:41Z) - Multi-Label Logo Recognition and Retrieval based on Weighted Fusion of
Neural Features [6.6144185930393435]
We propose a system for the multi-label classification and similarity search of logo images.
The method allows obtaining the most similar logos on the basis of their shape, color, business sector, semantics, general characteristics.
The proposed approach is evaluated using the European Union Trademark (EUTM) dataset.
arXiv Detail & Related papers (2022-05-11T11:40:40Z) - Discriminative Semantic Feature Pyramid Network with Guided Anchoring
for Logo Detection [52.36825190893928]
We propose a novel approach, named Discriminative Semantic Feature Pyramid Network with Guided Anchoring (DSFP-GA)
Our approach mainly consists of Discriminative Semantic Feature Pyramid (DSFP) and Guided Anchoring (GA)
arXiv Detail & Related papers (2021-08-31T11:59:00Z) - An Effective and Robust Detector for Logo Detection [58.448716977297565]
Some attackers fool the well-trained logo detection model for infringement.
A novel logo detector based on the mechanism of looking and thinking twice is proposed in this paper.
We extend detectoRS algorithm to a cascade schema with an equalization loss function, multi-scale transformations, and adversarial data augmentation.
arXiv Detail & Related papers (2021-08-01T10:17:53Z) - LogoDet-3K: A Large-Scale Image Dataset for Logo Detection [61.296935298332606]
We introduce LogoDet-3K, the largest logo detection dataset with full annotation.
It has 3,000 logo categories, about 200,000 manually annotated logo objects and 158,652 images.
We propose a strong baseline method Logo-Yolo, which incorporates Focal loss and CIoU loss into the state-of-the-art YOLOv3 framework for large-scale logo detection.
arXiv Detail & Related papers (2020-08-12T14:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.