Related papers: An automatic patent literature retrieval system based on LLM-RAG

An automatic patent literature retrieval system based on LLM-RAG

URL: http://arxiv.org/abs/2508.14064v1
Date: Mon, 11 Aug 2025 02:39:16 GMT
Title: An automatic patent literature retrieval system based on LLM-RAG
Authors: Yao Ding, Yuqing Wu, Ziyang Ding,
Abstract summary: This study presents an automated patent retrieval framework integrating Large Language Models LLMs with RetrievalAugmented Generation RAG technology.<n>System comprises three components: 1) a preprocessing module for patent data standardization, 2) a highefficiency vector retrieval engine leveraging LLMgenerated embeddings, and 3) a RAGenhanced query module that combines external document retrieval with contextaware response generation.
Score: 2.035980938365065
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the acceleration of technological innovation efficient retrieval and classification of patent literature have become essential for intellectual property management and enterprise RD Traditional keyword and rulebased retrieval methods often fail to address complex query intents or capture semantic associations across technical domains resulting in incomplete and lowrelevance results This study presents an automated patent retrieval framework integrating Large Language Models LLMs with RetrievalAugmented Generation RAG technology The system comprises three components: 1) a preprocessing module for patent data standardization, 2) a highefficiency vector retrieval engine leveraging LLMgenerated embeddings, and 3) a RAGenhanced query module that combines external document retrieval with contextaware response generation Evaluations were conducted on the Google Patents dataset 20062024 containing millions of global patent records with metadata such as filing date domain and status The proposed gpt35turbo0125RAG configuration achieved 805 semantic matching accuracy and 92.1% recall surpassing baseline LLM methods by 28 percentage points The framework also demonstrated strong generalization in crossdomain classification and semantic clustering tasks These results validate the effectiveness of LLMRAG integration for intelligent patent retrieval providing a foundation for nextgeneration AIdriven intellectual property analysis platforms

Related papers

Retrieval Augmented Generation (RAG) for Fintech: Agentic Design and Evaluation [0.16754194618631593]
This paper introduces an agentic RAG architecture to address domain-specific and dense terminology challenges.<n>We evaluate our approach against a standard RAG baseline using a curated dataset of 85 question-answer-reference triples from an enterprise knowledge base.
arXiv Detail & Related papers (2025-10-29T13:41:36Z)
Executable Knowledge Graphs for Replicating AI Research [65.41207324831583]
Executable Knowledge Graphs (xKG) is a modular and pluggable knowledge base that automatically integrates technical insights, code snippets, and domain-specific knowledge extracted from scientific literature.<n>Code will released at https://github.com/zjunlp/xKG.
arXiv Detail & Related papers (2025-10-20T17:53:23Z)
Domain-Specific Data Generation Framework for RAG Adaptation [58.20906914537952]
Retrieval-Augmented Generation (RAG) combines the language understanding and reasoning power of large language models with external retrieval to enable domain-grounded responses.<n>We propose RAGen, a framework for generating domain-grounded question-answer-context (QAC) triples tailored to diverse RAG adaptation approaches.
arXiv Detail & Related papers (2025-10-13T09:59:49Z)
From scratch to silver: Creating trustworthy training data for patent-SDG classification using Large Language Models [0.6727984016678534]
Classifying patents by their relevance to the UN Sustainable Development Goals (SDGs) is crucial for tracking how innovation addresses global challenges.<n>This paper frames patent-to-SDG classification as a weak supervision problem, using citations from patents to scientific publications (NPL citations) as a noisy initial signal.<n>We develop a composite labeling function (LF) that uses large language models (LLMs) to extract structured concepts from patents and papers based on a patent.
arXiv Detail & Related papers (2025-09-11T09:44:16Z)
KLIPA: A Knowledge Graph and LLM-Driven QA Framework for IP Analysis [25.25268746853138]
We introduce KLIPA, a novel framework that leverages a knowledge graph and a large language model (LLM) to significantly advance patent analysis.<n>Our approach integrates three key components: a structured knowledge graph to map explicit relationships between patents, a retrieval-augmented generation(RAG) system to uncover contextual connections, and an intelligent agent that dynamically determines the optimal strategy for resolving user queries.
arXiv Detail & Related papers (2025-09-09T15:40:23Z)
A Hybrid Ai Framework For Strategic Patent Portfolio Pruning: Integrating Learning To-Rank And Market Need Analysis For Technology Transfer Optimization [6.142730022466677]
This paper introduces a novel, multi stage hybrid intelligence framework for pruning patent portfolios to identify high value assets for technology transfer.<n>Our framework automates and deepens this process by combining a Learning to Rank model, which evaluates patents against over 30 legal and commercial parameters, with a "Need-Seed" agent-based system.
arXiv Detail & Related papers (2025-08-31T18:43:18Z)
HySemRAG: A Hybrid Semantic Retrieval-Augmented Generation Framework for Automated Literature Synthesis and Methodological Gap Analysis [55.2480439325792]
HySemRAG is a framework that combines Extract, Transform, Load (ETL) pipelines with Retrieval-Augmented Generation (RAG)<n>System addresses limitations in existing RAG architectures through a multi-layered approach.
arXiv Detail & Related papers (2025-08-01T20:30:42Z)
Retrieval-Augmented Generation Systems for Intellectual Property via Synthetic Multi-Angle Fine-tuning [2.4368308736427697]
Retrieval systems in the Intellectual Property (IP) field often struggle with diverse user queries.<n>We propose Multi-Angle Question Generation and Retrieval Fine-Tuning Method (MQG-RFM)<n>MQG-RFM combines prompt-engineered query generation with hard negative mining to enhance retrieval robustness without costly infrastructure changes.
arXiv Detail & Related papers (2025-05-31T12:19:35Z)
PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims [32.272839191711114]
We introduce PatentScore, a multi-dimensional evaluation framework for assessing LLM-generated patent claims.<n>Unlike general-purpose NLG metrics, PatentScore reflects patent-specific constraints and document structures, enabling evaluation beyond surface similarity.<n>We report a Pearson correlation of $r = 0.819$ with expert annotations, outperforming existing NLG metrics.
arXiv Detail & Related papers (2025-05-25T22:20:11Z)
Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph [22.002949442505926]
We propose MemGraph, a method that augments the patent matching capabilities of Large Language Models.<n>MemGraph prompts LLMs to identify relevant entities within patents, followed by attributing these entities to corresponding entities.<n> Experimental results on PatentMatch dataset demonstrate the effectiveness of MemGraph, achieving a 17.68% improvement over baseline LLMs.
arXiv Detail & Related papers (2025-04-21T03:56:56Z)
Enhancing Large Language Models (LLMs) for Telecommunications using Knowledge Graphs and Retrieval-Augmented Generation [52.8352968531863]
Large language models (LLMs) have made significant progress in general-purpose natural language processing tasks.<n>This paper presents a novel framework that combines knowledge graph (KG) and retrieval-augmented generation (RAG) techniques to enhance LLM performance in the telecom domain.
arXiv Detail & Related papers (2025-03-31T15:58:08Z)
A Hybrid Architecture with Efficient Fine Tuning for Abstractive Patent Document Summarization [0.0]
This study proposes a system for efficiently creating abstractive summaries of patent records.<n>The procedure involves leveraging the LexRank graph-based algorithm to retrieve the important sentences from input parent texts.
arXiv Detail & Related papers (2025-03-13T13:30:54Z)
AutoPatent: A Multi-Agent Framework for Automatic Patent Generation [16.862811929856313]
We introduce a novel and practical task known as Draft2Patent, along with its corresponding D2P benchmark, which challenges Large Language Models to generate full-length patents averaging 17K tokens based on initial drafts.<n>We propose a multi-agent framework called AutoPatent which leverages the LLM-based planner agent, writer agents, and examiner agent with PGTree and RRAG to generate lengthy, intricate, and high-quality complete patent documents.
arXiv Detail & Related papers (2024-12-13T02:27:34Z)
REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering [115.72130322143275]
REAR is a RElevance-Aware Retrieval-augmented approach for open-domain question answering (QA) We develop a novel architecture for LLM-based RAG systems, by incorporating a specially designed assessment module. Experiments on four open-domain QA tasks show that REAR significantly outperforms previous a number of competitive RAG approaches.
arXiv Detail & Related papers (2024-02-27T13:22:51Z)
Synergistic Interplay between Search and Large Language Models for Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections. InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.