PatentVision: A multimodal method for drafting patent applications
- URL: http://arxiv.org/abs/2510.09762v1
- Date: Fri, 10 Oct 2025 18:12:05 GMT
- Title: PatentVision: A multimodal method for drafting patent applications
- Authors: Ruo Yang, Sai Krishna Reddy Mudhiganti, Manali Sharma,
- Abstract summary: Large Vision Language Models (LVLMs) show promise across various tasks, but their application in automating patent writing remains underexplored.<n>We present PatentVision, a framework that integrates textual and visual inputs such as patent claims and drawings to generate complete patent specifications.<n> Experiments reveal it surpasses text only methods, producing outputs with greater fidelity and alignment with human written standards.
- Score: 2.2940141855172036
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Patent drafting is complex due to its need for detailed technical descriptions, legal compliance, and visual elements. Although Large Vision Language Models (LVLMs) show promise across various tasks, their application in automating patent writing remains underexplored. In this paper, we present PatentVision, a multimodal framework that integrates textual and visual inputs such as patent claims and drawings to generate complete patent specifications. Built on advanced LVLMs, PatentVision enhances accuracy by combining fine tuned vision language models with domain specific training tailored to patents. Experiments reveal it surpasses text only methods, producing outputs with greater fidelity and alignment with human written standards. Its incorporation of visual data allows it to better represent intricate design features and functional connections, leading to richer and more precise results. This study underscores the value of multimodal techniques in patent automation, providing a scalable tool to reduce manual workloads and improve consistency. PatentVision not only advances patent drafting but also lays the groundwork for broader use of LVLMs in specialized areas, potentially transforming intellectual property management and innovation processes.
Related papers
- AutoSpec: An Agentic Framework for Automatically Drafting Patent Specification [15.052472198494371]
Patents play a critical role in driving technological innovation by granting inventors exclusive rights to their inventions.<n>Despite recent advancements in language models, several challenges hinder the development of robust automated patent drafting systems.<n>We introduce AutoSpec, a secure, agentic framework for automatically drafting patent specification.
arXiv Detail & Related papers (2025-09-23T23:10:18Z) - DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding [14.090575139188422]
We develop a unified framework DesignCLIP for design patent applications with a large-scale dataset of U.S. design patents.<n>DesignCLIP incorporates class-aware classification and contrastive learning, utilizing generated detailed captions for patent images and multi-views image learning.<n>Our experiments show that DesignCLIP consistently outperforms baseline and SOTA models in the patent domain on all tasks.
arXiv Detail & Related papers (2025-08-21T06:36:24Z) - AnyRefill: A Unified, Data-Efficient Framework for Left-Prompt-Guided Vision Tasks [116.8706375364465]
We present a novel Left-Prompt-Guided (LPG) paradigm to address a diverse range of reference-based vision tasks.<n>We propose AnyRefill, that effectively adapts Text-to-Image (T2I) models to various vision tasks.
arXiv Detail & Related papers (2025-02-16T15:12:40Z) - PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures [7.16446145782558]
We introduce PatentDesc-355K, a novel large-scale dataset containing 355K patent figures along with their brief and detailed textual descriptions.<n>We also propose PatentLMM - a novel multimodal large language model specifically tailored to generate high-quality descriptions of patent figures.<n>Our proposed PatentLMM comprises two key components: (i) PatentMME, a specialized multimodal vision encoder that captures the unique structural elements of patent figures, and (ii) PatentLLaMA, a domain-adapted version of LLaMA fine-tuned on a large collection of patents.
arXiv Detail & Related papers (2025-01-25T04:45:32Z) - GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a Vision-Language Model (VLM)-based framework that generates content-aware text logo layouts.<n>We introduce two model techniques that reduce the computational cost for processing multiple glyph images simultaneously.<n>To support instruction tuning of our model, we construct two extensive text logo datasets that are five times larger than existing public datasets.
arXiv Detail & Related papers (2024-11-18T10:04:10Z) - Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders [89.41055673919895]
This study explores the design space for MLLMs using a mixture of vision encoders and resolutions.<n>We discover that simply concatenating visual tokens from a set of complementary vision encoders is as effective as more complex mixing architectures or strategies.<n>The resulting family of MLLMs, Eagle, surpasses other leading open-source models on major MLLM benchmarks.
arXiv Detail & Related papers (2024-08-28T17:59:31Z) - Fine-tuning Multimodal Large Language Models for Product Bundling [53.01642741096356]
We introduce Bundle-MLLM, a novel framework that fine-tunes large language models (LLMs) through a hybrid item tokenization approach.<n>Specifically, we integrate textual, media, and relational data into a unified tokenization, introducing a soft separation token to distinguish between textual and non-textual tokens.<n>We propose a progressive optimization strategy that fine-tunes LLMs for disentangled objectives: 1) learning bundle patterns and 2) enhancing multimodal semantic understanding specific to product bundling.
arXiv Detail & Related papers (2024-07-16T13:30:14Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.<n>Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.<n>We develop an automated text-to-poster system that generates editable posters based on users' design intentions.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - A Survey on Patent Analysis: From NLP to Multimodal AI [14.090575139188422]
This interdisciplinary survey aims to serve as a comprehensive resource for researchers and practitioners who work at the intersection of NLP, Multimodal AI, and patent analysis.
arXiv Detail & Related papers (2024-04-02T20:44:06Z) - Natural Language Processing in the Patent Domain: A Survey [0.0]
Patents encapsulate crucial technical and legal information in text form and referenced drawings.<n>This paper aims to equip NLP researchers with the essential knowledge to navigate this complex domain efficiently.
arXiv Detail & Related papers (2024-03-06T23:17:16Z) - Unveiling Black-boxes: Explainable Deep Learning Models for Patent
Classification [48.5140223214582]
State-of-the-art methods for multi-label patent classification rely on deep opaque neural networks (DNNs)
We propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP)
Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.
arXiv Detail & Related papers (2023-10-31T14:11:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.