Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval
- URL: http://arxiv.org/abs/2506.13496v3
- Date: Thu, 19 Jun 2025 10:08:49 GMT
- Title: Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval
- Authors: Kshitij Kavimandan, Angelos Nalmpantis, Emma Beauxis-Aussalet, Robert-Jan Sips,
- Abstract summary: Patent images are technical drawings that convey information about a patent's innovation.<n>Current methods neglect patents' hierarchical relationships, such as those defined by the Locarno International Classification system.<n>We introduce a hierarchical multi-positive contrastive loss that leverages the LIC's taxonomy to induce such relations in the retrieval process.
- Score: 0.2970959580204573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Patent images are technical drawings that convey information about a patent's innovation. Patent image retrieval systems aim to search in vast collections and retrieve the most relevant images. Despite recent advances in information retrieval, patent images still pose significant challenges due to their technical intricacies and complex semantic information, requiring efficient fine-tuning for domain adaptation. Current methods neglect patents' hierarchical relationships, such as those defined by the Locarno International Classification (LIC) system, which groups broad categories (e.g., "furnishing") into subclasses (e.g., "seats" and "beds") and further into specific patent designs. In this work, we introduce a hierarchical multi-positive contrastive loss that leverages the LIC's taxonomy to induce such relations in the retrieval process. Our approach assigns multiple positive pairs to each patent image within a batch, with varying similarity scores based on the hierarchical taxonomy. Our experimental analysis with various vision and multimodal models on the DeepPatent2 dataset shows that the proposed method enhances the retrieval results. Notably, our method is effective with low-parameter models, which require fewer computational resources and can be deployed on environments with limited hardware.
Related papers
- CoLLM: A Large Language Model for Composed Image Retrieval [76.29725148964368]
Composed Image Retrieval (CIR) is a complex task that aims to retrieve images based on a multimodal query.<n>We present CoLLM, a one-stop framework that generates triplets on-the-fly from image-caption pairs.<n>We leverage Large Language Models (LLMs) to generate joint embeddings of reference images and modification texts.
arXiv Detail & Related papers (2025-03-25T17:59:50Z) - Patent Figure Classification using Large Vision-language Models [7.505532091249881]
Large vision-language models (LVLMs) have shown tremendous performance across numerous computer vision downstream tasks.<n>Our work explores the efficacy of LVLMs in patent figure visual question answering (VQA) and classification, focusing on zero-shot and few-shot learning scenarios.<n>For a computational-effective handling of a large number of classes using LVLM, we propose a novel tournament-style classification strategy.
arXiv Detail & Related papers (2025-01-22T09:39:05Z) - A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends [67.43992456058541]
Image restoration (IR) aims to recover high-quality images from inputs degraded by various factors such as noise, blur, compression, and adverse weather.<n>Traditional IR methods typically focus on specific types of degradation, which limits their effectiveness in real-world scenarios with complex distortions.<n>The all-in-one image restoration paradigm has recently emerged, offering a unified framework that adeptly addresses multiple degradation types.
arXiv Detail & Related papers (2024-10-19T11:11:09Z) - Large Language Model Informed Patent Image Retrieval [0.0]
We propose a language-informed, distribution-aware multimodal approach to patent image feature learning.
Our proposed method achieves state-of-the-art or comparable performance in image-based patent retrieval with mAP +53.3%, Recall@10 +41.8%, and MRR@10 +51.9%.
arXiv Detail & Related papers (2024-04-30T08:45:16Z) - Unveiling Black-boxes: Explainable Deep Learning Models for Patent
Classification [48.5140223214582]
State-of-the-art methods for multi-label patent classification rely on deep opaque neural networks (DNNs)
We propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP)
Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.
arXiv Detail & Related papers (2023-10-31T14:11:37Z) - Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification [26.85734804493925]
We propose an integrated framework that comprehensively considers the information on patents for patent classification.
We first present an IPC codes correlations learning module to derive their semantic representations.
Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions.
arXiv Detail & Related papers (2023-08-10T07:02:24Z) - Event-based Dynamic Graph Representation Learning for Patent Application
Trend Prediction [45.0907126466271]
We propose an event-based graph learning framework for patent application trend prediction.
In particular, our method is founded on the memorable representations of both companies and patent classification codes.
arXiv Detail & Related papers (2023-08-04T05:43:32Z) - Classification of Visualization Types and Perspectives in Patents [9.123089032348311]
We adopt state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images.
We derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives.
arXiv Detail & Related papers (2023-07-19T21:45:07Z) - Searching a Compact Architecture for Robust Multi-Exposure Image Fusion [55.37210629454589]
Two major stumbling blocks hinder the development, including pixel misalignment and inefficient inference.
This study introduces an architecture search-based paradigm incorporating self-alignment and detail repletion modules for robust multi-exposure image fusion.
The proposed method outperforms various competitive schemes, achieving a noteworthy 3.19% improvement in PSNR for general scenarios and an impressive 23.5% enhancement in misaligned scenarios.
arXiv Detail & Related papers (2023-05-20T17:01:52Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.