LLM-powered Real-time Patent Citation Recommendation for Financial Technologies
- URL: http://arxiv.org/abs/2601.16775v1
- Date: Fri, 23 Jan 2026 14:21:30 GMT
- Title: LLM-powered Real-time Patent Citation Recommendation for Financial Technologies
- Authors: Tianang Deng, Yu Deng, Tianchen Gao, Yonghong Hu, Rui Pan,
- Abstract summary: We propose a real-time patent citation recommendation framework designed for large and fast-changing financial patent corpora.<n>We use a dataset of 428,843 financial patents granted by the China National Intellectual Property Administration (CNIPA) between 2000 and 2024.<n>We show that incremental updating improves recall while substantially reducing computational cost compared with rebuild-based indexing.
- Score: 6.544698036896045
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Rapid financial innovation has been accompanied by a sharp increase in patenting activity, making timely and comprehensive prior-art discovery more difficult. This problem is especially evident in financial technologies, where innovations develop quickly, patent collections grow continuously, and citation recommendation systems must be updated as new applications arrive. Existing patent retrieval and citation recommendation methods typically rely on static indexes or periodic retraining, which limits their ability to operate effectively in such dynamic settings. In this study, we propose a real-time patent citation recommendation framework designed for large and fast-changing financial patent corpora. Using a dataset of 428,843 financial patents granted by the China National Intellectual Property Administration (CNIPA) between 2000 and 2024, we build a three-stage recommendation pipeline. The pipeline uses large language model (LLM) embeddings to represent the semantic content of patent abstracts, applies efficient approximate nearest-neighbor search to construct a manageable candidate set, and ranks candidates by semantic similarity to produce top-k citation recommendations. In addition to improving recommendation accuracy, the proposed framework directly addresses the dynamic nature of patent systems. By using an incremental indexing strategy based on hierarchical navigable small-world (HNSW) graphs, newly issued patents can be added without rebuilding the entire index. A rolling day-by-day update experiment shows that incremental updating improves recall while substantially reducing computational cost compared with rebuild-based indexing. The proposed method also consistently outperforms traditional text-based baselines and alternative nearest-neighbor retrieval approaches.
Related papers
- Rethinking On-policy Optimization for Query Augmentation [49.87723664806526]
We present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks.<n>We introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which learns to generate a pseudo-document that maximizes retrieval performance.
arXiv Detail & Related papers (2025-10-20T04:16:28Z) - Reinforcement Learning for Durable Algorithmic Recourse [49.54997446851335]
We present a time-aware framework for algorithmic recourse, explicitly modeling how candidate populations adapt in response to recommendations.<n>We also introduce a novel reinforcement learning (RL)-based recourse algorithm that captures the evolving dynamics of the environment.
arXiv Detail & Related papers (2025-09-26T09:24:12Z) - BAT: Benchmark for Auto-bidding Task [67.56067222427946]
We present an auction benchmark encompassing the two most prevalent auction formats.<n>We implement a series of robust baselines on a novel dataset.<n>This benchmark provides a user-friendly and intuitive framework for researchers and practitioners to develop and refine innovative autobidding algorithms.
arXiv Detail & Related papers (2025-05-13T12:12:34Z) - EvoPat: A Multi-LLM-based Patents Summarization and Analysis Agent [0.0]
EvoPat is a multi-LLM-based patent agent designed to assist users in analyzing patents through Retrieval-Augmented Generation (RAG) and advanced search strategies.<n>We demonstrate that EvoPat outperforms GPT-4 in tasks such as patent summarization, comparative analysis, and technical evaluation.
arXiv Detail & Related papers (2024-12-24T02:21:09Z) - PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions.
We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models.
We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z) - ClaimBrush: A Novel Framework for Automated Patent Claim Refinement Based on Large Language Models [3.3427063846107825]
ClaimBrush is a novel framework for automated patent claim refinement that includes a dataset and a rewriting model.
We constructed a dataset for training and evaluating patent claim rewriting models by collecting a large number of actual patent claim rewriting cases.
Our proposed rewriting model outperformed baselines and zero-shot learning in state-of-the-art large language models.
arXiv Detail & Related papers (2024-10-08T00:20:54Z) - DNS-Rec: Data-aware Neural Architecture Search for Recommender Systems [79.76519917171261]
This paper addresses the computational overhead and resource inefficiency prevalent in Sequential Recommender Systems (SRSs)<n>We introduce an innovative approach combining pruning methods with advanced model designs.<n>Our principal contribution is the development of a Data-aware Neural Architecture Search for Recommender System (DNS-Rec)
arXiv Detail & Related papers (2024-02-01T07:22:52Z) - Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification [26.85734804493925]
We propose an integrated framework that comprehensively considers the information on patents for patent classification.
We first present an IPC codes correlations learning module to derive their semantic representations.
Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions.
arXiv Detail & Related papers (2023-08-10T07:02:24Z) - Event-based Dynamic Graph Representation Learning for Patent Application
Trend Prediction [45.0907126466271]
We propose an event-based graph learning framework for patent application trend prediction.
In particular, our method is founded on the memorable representations of both companies and patent classification codes.
arXiv Detail & Related papers (2023-08-04T05:43:32Z) - Predictive Patentomics: Forecasting Innovation Success and Valuation
with ChatGPT [0.0]
OpenAI's state-of-the-art textual embedding accesses complex information about the quality and impact of each invention.
The nuanced embedding drives a 24% incremental improvement in R-squared predicting patent value.
arXiv Detail & Related papers (2023-06-22T13:21:20Z) - The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and
Multi-Purpose Corpus of Patent Applications [8.110699646062384]
We introduce the Harvard USPTO Patent dataset (HUPD)
With more than 4.5 million patent documents, HUPD is two to three times larger than comparable corpora.
By providing each application's metadata along with all of its text fields, the dataset enables researchers to perform new sets of NLP tasks.
arXiv Detail & Related papers (2022-07-08T17:57:15Z) - Deep learning-based citation recommendation system for patents [5.376388266200792]
We present a novel dataset called PatentNet that includes textual information and metadata for approximately 110,000 patents from the Google Big Query service.
Compared with existing recommendation methods, the proposed benchmark method achieved a mean reciprocal rank of 0.2377 on the test set.
arXiv Detail & Related papers (2020-10-21T12:18:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.