Related papers: Natural Language Processing in Patents: A Survey

Natural Language Processing in Patents: A Survey

URL: http://arxiv.org/abs/2403.04105v2
Date: Mon, 12 Aug 2024 18:30:06 GMT
Title: Natural Language Processing in Patents: A Survey
Authors: Lekang Jiang, Stephan Goetz,
Abstract summary: Patents, encapsulating crucial technical and legal information, present a rich domain for natural language processing (NLP) applications. As NLP technologies evolve, large language models (LLMs) have demonstrated outstanding capabilities in general text processing and generation tasks. This paper aims to equip NLP researchers with the essential knowledge to navigate this complex domain efficiently.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Patents, encapsulating crucial technical and legal information, present a rich domain for natural language processing (NLP) applications. As NLP technologies evolve, large language models (LLMs) have demonstrated outstanding capabilities in general text processing and generation tasks. However, the application of LLMs in the patent domain remains under-explored and under-developed due to the complexity of patent processing. Understanding the unique characteristics of patent documents and related research in the patent domain becomes essential for researchers to apply these tools effectively. Therefore, this paper aims to equip NLP researchers with the essential knowledge to navigate this complex domain efficiently. We introduce the relevant fundamental aspects of patents to provide solid background information, particularly for readers unfamiliar with the patent system. In addition, we systematically break down the structural and linguistic characteristics unique to patents and map out how NLP can be leveraged for patent analysis and generation. Moreover, we demonstrate the spectrum of text-based patent-related tasks, including nine patent analysis and four patent generation tasks.

Related papers

Zero-Shot Keyphrase Generation: Investigating Specialized Instructions and Multi-Sample Aggregation on Large Language Models [52.829293635314194]
Keyphrase generation is a long-standing NLP task for automatically generating keyphrases for a given document. We focus on the zero-shot capabilities of open-source instruction-tuned LLMs (Phi-3, Llama-3) and the closed-source GPT-4o for this task.
arXiv Detail & Related papers (2025-03-01T19:38:57Z)
EvoPat: A Multi-LLM-based Patents Summarization and Analysis Agent [0.0]
EvoPat is a multi-LLM-based patent agent designed to assist users in analyzing patents through Retrieval-Augmented Generation (RAG) and advanced search strategies. We demonstrate that EvoPat outperforms GPT-4 in tasks such as patent summarization, comparative analysis, and technical evaluation.
arXiv Detail & Related papers (2024-12-24T02:21:09Z)
AutoPatent: A Multi-Agent Framework for Automatic Patent Generation [16.862811929856313]
We introduce a novel and practical task known as Draft2Patent, along with its corresponding D2P benchmark, which challenges Large Language Models to generate full-length patents averaging 17K tokens based on initial drafts. We propose a multi-agent framework called AutoPatent which leverages the LLM-based planner agent, writer agents, and examiner agent with PGTree and RRAG to generate lengthy, intricate, and high-quality complete patent documents.
arXiv Detail & Related papers (2024-12-13T02:27:34Z)
Intelligent System for Automated Molecular Patent Infringement Assessment [38.48937966447085]
PatentFinder is a novel multi-agent and tool-enhanced intelligence system that can accurately and comprehensively evaluate small molecules for patent infringement.<n>PatentFinder features five specialized agents that collaboratively analyze patent claims and molecular structures.<n>PatentFinder autonomously generates detailed and interpretable patent infringement reports, showcasing enhanced accuracy and improved interpretability.
arXiv Detail & Related papers (2024-12-10T12:14:38Z)
PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions. We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models. We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z)
Pap2Pat: Towards Automated Paper-to-Patent Drafting using Chunk-based Outline-guided Generation [13.242188189150987]
We present PAP2PAT, a new challenging benchmark of 1.8k patent-paper pairs with document outlines. Our experiments with current open-weight LLMs and outline-guided generation show that they can effectively use information from the paper but struggle with repetitions, likely due to the inherent repetitiveness of patent language.
arXiv Detail & Related papers (2024-10-09T15:52:48Z)
Automating Knowledge Discovery from Scientific Literature via LLMs: A Dual-Agent Approach with Progressive Ontology Prompting [59.97247234955861]
We introduce a novel framework based on large language models (LLMs) that combines a progressive prompting algorithm with a dual-agent system, named LLM-Duo. Our method identifies 2,421 interventions from 64,177 research articles in the speech-language therapy domain.
arXiv Detail & Related papers (2024-08-20T16:42:23Z)
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models [71.25225058845324]
Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation. Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge. RA-LLMs have emerged to harness external and authoritative knowledge bases, rather than relying on the model's internal knowledge.
arXiv Detail & Related papers (2024-05-10T02:48:45Z)
A Comprehensive Survey on AI-based Methods for Patents [14.090575139188422]
AI-based tools present opportunities to streamline and enhance important tasks in the patent cycle. This interdisciplinary survey aims to serve as a resource for researchers and practitioners working at the intersection of AI and patent analysis.
arXiv Detail & Related papers (2024-04-02T20:44:06Z)
Enhancing Court View Generation with Knowledge Injection and Guidance [43.32071790286732]
Court View Generation (CVG) aims to generate court views based on the plaintiff claims and the fact descriptions. PLMs have showcased their prowess in natural language generation, but their application to the complex, knowledge-intensive domain of CVG often reveals inherent limitations. We present a novel approach, named Knowledge Injection and Guidance (KIG), designed to bolster CVG using PLMs. To efficiently incorporate domain knowledge during the training stage, we introduce a knowledge-injected prompt encoder for prompt tuning, thereby reducing computational overhead.
arXiv Detail & Related papers (2024-03-07T09:51:11Z)
Unveiling Black-boxes: Explainable Deep Learning Models for Patent Classification [48.5140223214582]
State-of-the-art methods for multi-label patent classification rely on deep opaque neural networks (DNNs) We propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP) Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.
arXiv Detail & Related papers (2023-10-31T14:11:37Z)
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z)
Source Attribution for Large Language Model-Generated Data [57.85840382230037]
It is imperative to be able to perform source attribution by identifying the data provider who contributed to the generation of a synthetic text. We show that this problem can be tackled by watermarking. We propose a source attribution framework that satisfies these key properties due to our algorithmic designs.
arXiv Detail & Related papers (2023-10-01T12:02:57Z)
The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and Multi-Purpose Corpus of Patent Applications [8.110699646062384]
We introduce the Harvard USPTO Patent dataset (HUPD) With more than 4.5 million patent documents, HUPD is two to three times larger than comparable corpora. By providing each application's metadata along with all of its text fields, the dataset enables researchers to perform new sets of NLP tasks.
arXiv Detail & Related papers (2022-07-08T17:57:15Z)
Patent Sentiment Analysis to Highlight Patent Paragraphs [0.0]
Given a patent document, identifying distinct semantic annotations is an interesting research aspect. In the process of manual patent analysis, to attain better readability, recognising the semantic information by marking paragraphs is in practice. This work assist patent practitioners in highlighting semantic information automatically and aid to create a sustainable and efficient patent analysis using the aptitude of Machine Learning.
arXiv Detail & Related papers (2021-11-06T13:28:29Z)
Counterfactual Explanations as Interventions in Latent Space [62.997667081978825]
Counterfactual explanations aim to provide to end users a set of features that need to be changed in order to achieve a desired outcome. Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations. We present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations.
arXiv Detail & Related papers (2021-06-14T20:48:48Z)
Summarization, Simplification, and Generation: The Case of Patents [0.0]
This survey aims at a) describing patents' characteristics and the questions they raise to the current NLP systems, b) critically presenting previous work and its evolution, and c) drawing attention to directions of research in which further work is needed. To the best of our knowledge, this is the first survey of generative approaches in the patent domain.
arXiv Detail & Related papers (2021-04-30T09:28:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.