Related papers: DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding

DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding

URL: http://arxiv.org/abs/2508.15297v1
Date: Thu, 21 Aug 2025 06:36:24 GMT
Title: DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding
Authors: Zhu Wang, Homaira Huda Shomee, Sathya N. Ravi, Sourav Medya,
Abstract summary: We develop a unified framework DesignCLIP for design patent applications with a large-scale dataset of U.S. design patents.<n>DesignCLIP incorporates class-aware classification and contrastive learning, utilizing generated detailed captions for patent images and multi-views image learning.<n>Our experiments show that DesignCLIP consistently outperforms baseline and SOTA models in the patent domain on all tasks.
Score: 14.090575139188422
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In the field of design patent analysis, traditional tasks such as patent classification and patent image retrieval heavily depend on the image data. However, patent images -- typically consisting of sketches with abstract and structural elements of an invention -- often fall short in conveying comprehensive visual context and semantic information. This inadequacy can lead to ambiguities in evaluation during prior art searches. Recent advancements in vision-language models, such as CLIP, offer promising opportunities for more reliable and accurate AI-driven patent analysis. In this work, we leverage CLIP models to develop a unified framework DesignCLIP for design patent applications with a large-scale dataset of U.S. design patents. To address the unique characteristics of patent data, DesignCLIP incorporates class-aware classification and contrastive learning, utilizing generated detailed captions for patent images and multi-views image learning. We validate the effectiveness of DesignCLIP across various downstream tasks, including patent classification and patent retrieval. Additionally, we explore multimodal patent retrieval, which provides the potential to enhance creativity and innovation in design by offering more diverse sources of inspiration. Our experiments show that DesignCLIP consistently outperforms baseline and SOTA models in the patent domain on all tasks. Our findings underscore the promise of multimodal approaches in advancing patent analysis. The codebase is available here: https://anonymous.4open.science/r/PATENTCLIP-4661/README.md.

Related papers

PatentVision: A multimodal method for drafting patent applications [2.2940141855172036]
Large Vision Language Models (LVLMs) show promise across various tasks, but their application in automating patent writing remains underexplored.<n>We present PatentVision, a framework that integrates textual and visual inputs such as patent claims and drawings to generate complete patent specifications.<n> Experiments reveal it surpasses text only methods, producing outputs with greater fidelity and alignment with human written standards.
arXiv Detail & Related papers (2025-10-10T18:12:05Z)
Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval [0.2970959580204573]
Patent images are technical drawings that convey information about a patent's innovation.<n>Current methods neglect patents' hierarchical relationships, such as those defined by the Locarno International Classification system.<n>We introduce a hierarchical multi-positive contrastive loss that leverages the LIC's taxonomy to induce such relations in the retrieval process.
arXiv Detail & Related papers (2025-06-16T13:53:02Z)
Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors [50.7383184560431]
Continual learning (CL) enables deep networks to acquire new knowledge while avoiding catastrophic forgetting.<n>We propose a concise CL approach for CLIP based on incremental prompt tuning.<n>We show that our bidirectional supervision strategy enables more effective learning of new knowledge while reducing forgetting.
arXiv Detail & Related papers (2025-05-27T03:51:37Z)
IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property [53.2129505804405]
IPBench is the first comprehensive IP task taxonomy and a large-scale benchmark encompassing 8 IP mechanisms and 20 distinct tasks.<n>We benchmark 17 main LLMs, ranging from general purpose to domain-specific, including chat-oriented and reasoning-focused models.<n>Our results show that even the top-performing model, DeepSeek-V3, achieves only 75.8% accuracy, indicating significant room for improvement.
arXiv Detail & Related papers (2025-04-22T02:00:41Z)
IP-Composer: Semantic Composition of Visual Concepts [49.18472621931207]
We present IP-Composer, a training-free approach for compositional image generation.<n>Our method builds on IP-Adapter, which synthesizes novel images conditioned on an input image's CLIP embedding.<n>We extend this approach to multiple visual inputs by crafting composite embeddings, stitched from the projections of multiple input images onto concept-specific CLIP-subspaces identified through text.
arXiv Detail & Related papers (2025-02-19T18:49:31Z)
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures [7.16446145782558]
We introduce PatentDesc-355K, a novel large-scale dataset containing 355K patent figures along with their brief and detailed textual descriptions.<n>We also propose PatentLMM - a novel multimodal large language model specifically tailored to generate high-quality descriptions of patent figures.<n>Our proposed PatentLMM comprises two key components: (i) PatentMME, a specialized multimodal vision encoder that captures the unique structural elements of patent figures, and (ii) PatentLLaMA, a domain-adapted version of LLaMA fine-tuned on a large collection of patents.
arXiv Detail & Related papers (2025-01-25T04:45:32Z)
GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a Vision-Language Model (VLM)-based framework that generates content-aware text logo layouts.<n>We introduce two model techniques that reduce the computational cost for processing multiple glyph images simultaneously.<n>To support instruction tuning of our model, we construct two extensive text logo datasets that are five times larger than existing public datasets.
arXiv Detail & Related papers (2024-11-18T10:04:10Z)
A Survey on Patent Analysis: From NLP to Multimodal AI [14.090575139188422]
This interdisciplinary survey aims to serve as a comprehensive resource for researchers and practitioners who work at the intersection of NLP, Multimodal AI, and patent analysis.
arXiv Detail & Related papers (2024-04-02T20:44:06Z)
Unveiling Black-boxes: Explainable Deep Learning Models for Patent Classification [48.5140223214582]
State-of-the-art methods for multi-label patent classification rely on deep opaque neural networks (DNNs) We propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP) Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.
arXiv Detail & Related papers (2023-10-31T14:11:37Z)
Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification [26.85734804493925]
We propose an integrated framework that comprehensively considers the information on patents for patent classification. We first present an IPC codes correlations learning module to derive their semantic representations. Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions.
arXiv Detail & Related papers (2023-08-10T07:02:24Z)
Multi-Perspective LSTM for Joint Visual Representation Learning [81.21490913108835]
We present a novel LSTM cell architecture capable of learning both intra- and inter-perspective relationships available in visual sequences captured from multiple perspectives. Our architecture adopts a novel recurrent joint learning strategy that uses additional gates and memories at the cell level. We show that by using the proposed cell to create a network, more effective and richer visual representations are learned for recognition tasks.
arXiv Detail & Related papers (2021-05-06T16:44:40Z)
A Convolutional Neural Network-based Patent Image Retrieval Method for Design Ideation [5.195924252155368]
We propose a convolutional neural network (CNN)-based patent image retrieval method. The core of this approach is a novel neural network architecture named Dual-VGG. The accuracy of both training tasks and patent image embedding space are evaluated to show the performance of our model.
arXiv Detail & Related papers (2020-03-10T13:32:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.