Related papers: A Novel Patent Similarity Measurement Methodology: Semantic Distance and Technological Distance

A Novel Patent Similarity Measurement Methodology: Semantic Distance and Technological Distance

URL: http://arxiv.org/abs/2303.16767v2
Date: Fri, 1 Dec 2023 04:29:49 GMT
Title: A Novel Patent Similarity Measurement Methodology: Semantic Distance and Technological Distance
Authors: Yongmin Yoo, Cheonkam Jeong, Sanguk Gim, Junwon Lee, Zachary Schimke, Deaho Seo
Abstract summary: Patent similarity analysis plays a crucial role in evaluating the risk of patent infringement. Recent advances in natural language processing technology offer a promising avenue for automating this process. We propose a hybrid methodology that takes into account similarity, measures the similarity between patents by considering the semantic similarity of patents.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Patent similarity analysis plays a crucial role in evaluating the risk of patent infringement. Nonetheless, this analysis is predominantly conducted manually by legal experts, often resulting in a time-consuming process. Recent advances in natural language processing technology offer a promising avenue for automating this process. However, methods for measuring similarity between patents still rely on experts manually classifying patents. Due to the recent development of artificial intelligence technology, a lot of research is being conducted focusing on the semantic similarity of patents using natural language processing technology. However, it is difficult to accurately analyze patent data, which are legal documents representing complex technologies, using existing natural language processing technologies. To address these limitations, we propose a hybrid methodology that takes into account bibliographic similarity, measures the similarity between patents by considering the semantic similarity of patents, the technical similarity between patents, and the bibliographic information of patents. Using natural language processing techniques, we measure semantic similarity based on patent text and calculate technical similarity through the degree of coexistence of International patent classification (IPC) codes. The similarity of bibliographic information of a patent is calculated using the special characteristics of the patent: citation information, inventor information, and assignee information. We propose a model that assigns reasonable weights to each similarity method considered. With the help of experts, we performed manual similarity evaluations on 420 pairs and evaluated the performance of our model based on this data. We have empirically shown that our method outperforms recent natural language processing techniques.

Related papers

PatentMind: A Multi-Aspect Reasoning Graph for Patent Similarity Evaluation [32.272839191711114]
We introduce PatentMind, a novel framework for patent similarity assessment based on a Multi-Aspect Reasoning Graph (MARG)<n>PatentMind decomposes patents into three core dimensions: technical feature, application domain, and claim scope, to compute dimension-specific similarity scores.<n>To support evaluation, we construct PatentSimBench, a human-annotated benchmark comprising 500 patent pairs.
arXiv Detail & Related papers (2025-05-25T22:28:27Z)
PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions. We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models. We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z)
A comparative analysis of embedding models for patent similarity [0.0]
This paper makes two contributions to the field of text-based patent similarity. It compares the performance of different kinds of patent-specific pretrained embedding models.
arXiv Detail & Related papers (2024-03-25T11:20:23Z)
Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs [18.86788223751979]
We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases. We introduce a graph-augmented approach to amplify the global contextual information of the patent phrases.
arXiv Detail & Related papers (2024-03-24T18:59:38Z)
Natural Language Processing in Patents: A Survey [0.0]
Patents, encapsulating crucial technical and legal information, present a rich domain for natural language processing (NLP) applications. As NLP technologies evolve, large language models (LLMs) have demonstrated outstanding capabilities in general text processing and generation tasks. This paper aims to equip NLP researchers with the essential knowledge to navigate this complex domain efficiently.
arXiv Detail & Related papers (2024-03-06T23:17:16Z)
Measuring Technological Convergence in Encryption Technologies with Proximity Indices: A Text Mining and Bibliometric Analysis using OpenAlex [46.3643544723237]
This study identifies technological convergence among emerging technologies in cybersecurity. The proposed method integrates text mining and bibliometric analyses to formulate and predict technological proximity indices. Our case study findings highlight a significant convergence between blockchain and public-key cryptography, evidenced by the increasing proximity indices.
arXiv Detail & Related papers (2024-03-03T20:03:03Z)
PaECTER: Patent-level Representation Learning using Citation-informed Transformers [0.16785092703248325]
PaECTER is a publicly available, open-source document-level encoder specific for patents. We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents. PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain.
arXiv Detail & Related papers (2024-02-29T18:09:03Z)
Unveiling Black-boxes: Explainable Deep Learning Models for Patent Classification [48.5140223214582]
State-of-the-art methods for multi-label patent classification rely on deep opaque neural networks (DNNs) We propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP) Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.
arXiv Detail & Related papers (2023-10-31T14:11:37Z)
Multi label classification of Artificial Intelligence related patents using Modified D2SBERT and Sentence Attention mechanism [0.0]
We present a method for classifying artificial intelligence-related patents published by the USPTO using natural language processing technique and deep learning methodology. Our experiment result is highest performance compared to other deep learning methods.
arXiv Detail & Related papers (2023-03-03T12:27:24Z)
A Survey on Sentence Embedding Models Performance for Patent Analysis [0.0]
We propose a standard library and dataset for assessing the accuracy of embeddings models based on PatentSBERTa approach. Results show PatentSBERTa, Bert-for-patents, and TF-IDF Weighted Word Embeddings have the best accuracy for computing sentence embeddings at the subclass level.
arXiv Detail & Related papers (2022-04-28T12:04:42Z)
Counterfactual Explanations as Interventions in Latent Space [62.997667081978825]
Counterfactual explanations aim to provide to end users a set of features that need to be changed in order to achieve a desired outcome. Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations. We present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations.
arXiv Detail & Related papers (2021-06-14T20:48:48Z)
An interdisciplinary conceptual study of Artificial Intelligence (AI) for helping benefit-risk assessment practices: Towards a comprehensive qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence. The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z)
A Diagnostic Study of Explainability Techniques for Text Classification [52.879658637466605]
We develop a list of diagnostic properties for evaluating existing explainability techniques. We compare the saliency scores assigned by the explainability techniques with human annotations of salient input regions to find relations between a model's performance and the agreement of its rationales with human ones.
arXiv Detail & Related papers (2020-09-25T12:01:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.