Patent Figure Classification using Large Vision-language Models
- URL: http://arxiv.org/abs/2501.12751v1
- Date: Wed, 22 Jan 2025 09:39:05 GMT
- Title: Patent Figure Classification using Large Vision-language Models
- Authors: Sushil Awale, Eric Müller-Budack, Ralph Ewerth,
- Abstract summary: Large vision-language models (LVLMs) have shown tremendous performance across numerous computer vision downstream tasks.
Our work explores the efficacy of LVLMs in patent figure visual question answering (VQA) and classification, focusing on zero-shot and few-shot learning scenarios.
For a computational-effective handling of a large number of classes using LVLM, we propose a novel tournament-style classification strategy.
- Score: 7.505532091249881
- License:
- Abstract: Patent figure classification facilitates faceted search in patent retrieval systems, enabling efficient prior art search. Existing approaches have explored patent figure classification for only a single aspect and for aspects with a limited number of concepts. In recent years, large vision-language models (LVLMs) have shown tremendous performance across numerous computer vision downstream tasks, however, they remain unexplored for patent figure classification. Our work explores the efficacy of LVLMs in patent figure visual question answering (VQA) and classification, focusing on zero-shot and few-shot learning scenarios. For this purpose, we introduce new datasets, PatFigVQA and PatFigCLS, for fine-tuning and evaluation regarding multiple aspects of patent figures~(i.e., type, projection, patent class, and objects). For a computational-effective handling of a large number of classes using LVLM, we propose a novel tournament-style classification strategy that leverages a series of multiple-choice questions. Experimental results and comparisons of multiple classification approaches based on LVLMs and Convolutional Neural Networks (CNNs) in few-shot settings show the feasibility of the proposed approaches.
Related papers
- LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging [65.72891334156706]
We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification.
LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items.
Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music.
arXiv Detail & Related papers (2024-09-17T15:13:07Z) - Towards Generative Class Prompt Learning for Fine-grained Visual Recognition [5.633314115420456]
Generative Class Prompt Learning and Contrastive Multi-class Prompt Learning are presented.
Generative Class Prompt Learning improves visio-linguistic synergy in class embeddings by conditioning on few-shot exemplars with learnable class prompts.
CoMPLe builds on this foundation by introducing a contrastive learning component that encourages inter-class separation.
arXiv Detail & Related papers (2024-09-03T12:34:21Z) - RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition [78.97487780589574]
Multimodal Large Language Models (MLLMs) excel at classifying fine-grained categories.
This paper introduces a Retrieving And Ranking augmented method for MLLMs.
Our proposed approach not only addresses the inherent limitations in fine-grained recognition but also preserves the model's comprehensive knowledge base.
arXiv Detail & Related papers (2024-03-20T17:59:55Z) - Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning [23.671999163027284]
This paper proposes a novel framework for multi-label image recognition without any training data.
It uses knowledge of pre-trained Large Language Model to learn prompts to adapt pretrained Vision-Language Model like CLIP to multilabel classification.
Our framework presents a new way to explore the synergies between multiple pre-trained models for novel category recognition.
arXiv Detail & Related papers (2024-03-02T13:43:32Z) - Classification of Visualization Types and Perspectives in Patents [9.123089032348311]
We adopt state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images.
We derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives.
arXiv Detail & Related papers (2023-07-19T21:45:07Z) - SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for
Few-shot Image Classification [84.05253637260743]
We propose a new framework, named Semantic-guided Visual Adapting (SgVA), to extend vision-language pre-trained models.
SgVA produces discriminative task-specific visual features by comprehensively using a vision-specific contrastive loss, a cross-modal contrastive loss, and an implicit knowledge distillation.
State-of-the-art results on 13 datasets demonstrate that the adapted visual features can well complement the cross-modal features to improve few-shot image classification.
arXiv Detail & Related papers (2022-11-28T14:58:15Z) - A Study on Representation Transfer for Few-Shot Learning [5.717951523323085]
Few-shot classification aims to learn to classify new object categories well using only a few labeled examples.
In this work we perform a systematic study of various feature representations for few-shot classification.
We find that learning from more complex tasks tend to give better representations for few-shot classification.
arXiv Detail & Related papers (2022-09-05T17:56:02Z) - Automatically Discovering Novel Visual Categories with Self-supervised
Prototype Learning [68.63910949916209]
This paper tackles the problem of novel category discovery (NCD), which aims to discriminate unknown categories in large-scale image collections.
We propose a novel adaptive prototype learning method consisting of two main stages: prototypical representation learning and prototypical self-training.
We conduct extensive experiments on four benchmark datasets and demonstrate the effectiveness and robustness of the proposed method with state-of-the-art performance.
arXiv Detail & Related papers (2022-08-01T16:34:33Z) - Few-shot Classification via Adaptive Attention [93.06105498633492]
We propose a novel few-shot learning method via optimizing and fast adapting the query sample representation based on very few reference samples.
As demonstrated experimentally, the proposed model achieves state-of-the-art classification results on various benchmark few-shot classification and fine-grained recognition datasets.
arXiv Detail & Related papers (2020-08-06T05:52:59Z) - Fine-Grained Visual Classification via Progressive Multi-Granularity
Training of Jigsaw Patches [67.51747235117]
Fine-grained visual classification (FGVC) is much more challenging than traditional classification tasks.
Recent works mainly tackle this problem by focusing on how to locate the most discriminative parts.
We propose a novel framework for fine-grained visual classification to tackle these problems.
arXiv Detail & Related papers (2020-03-08T19:27:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.