Related papers: ClaimCompare: A Data Pipeline for Evaluation of Novelty Destroying Patent Pairs

ClaimCompare: A Data Pipeline for Evaluation of Novelty Destroying Patent Pairs

URL: http://arxiv.org/abs/2407.12193v1
Date: Tue, 16 Jul 2024 21:38:45 GMT
Title: ClaimCompare: A Data Pipeline for Evaluation of Novelty Destroying Patent Pairs
Authors: Arav Parikh, Shiri Dori-Hacohen,
Abstract summary: We introduce a novel data pipeline, ClaimCompare, designed to generate labeled patent claim datasets suitable for training IR and ML models. To the best of our knowledge, ClaimCompare is the first pipeline that can generate multiple novelty destroying patent datasets.
Score: 2.60235825984014
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A fundamental step in the patent application process is the determination of whether there exist prior patents that are novelty destroying. This step is routinely performed by both applicants and examiners, in order to assess the novelty of proposed inventions among the millions of applications filed annually. However, conducting this search is time and labor-intensive, as searchers must navigate complex legal and technical jargon while covering a large amount of legal claims. Automated approaches using information retrieval and machine learning approaches to detect novelty destroying patents present a promising avenue to streamline this process, yet research focusing on this space remains limited. In this paper, we introduce a novel data pipeline, ClaimCompare, designed to generate labeled patent claim datasets suitable for training IR and ML models to address this challenge of novelty destruction assessment. To the best of our knowledge, ClaimCompare is the first pipeline that can generate multiple novelty destroying patent datasets. To illustrate the practical relevance of this pipeline, we utilize it to construct a sample dataset comprising of over 27K patents in the electrochemical domain: 1,045 base patents from USPTO, each associated with 25 related patents labeled according to their novelty destruction towards the base patent. Subsequently, we conduct preliminary experiments showcasing the efficacy of this dataset in fine-tuning transformer models to identify novelty destroying patents, demonstrating 29.2% and 32.7% absolute improvement in MRR and P@1, respectively.

Related papers

PEDANTIC: A Dataset for the Automatic Examination of Definiteness in Patent Claims [13.242188189150987]
PEDANTIC is a dataset of 14k US patent claims annotated with reasons for indefiniteness.<n>A human validation study confirms the pipeline's accuracy in generating high-quality annotations.<n> PEDANTIC provides a valuable resource for patent AI researchers, enabling the development of advanced examination models.
arXiv Detail & Related papers (2025-05-27T15:34:39Z)
Towards Better Evaluation for Generated Patent Claims [0.0]
We introduce Patent-CE, the first comprehensive benchmark for evaluating patent claims.<n>We also propose PatClaimEval, a novel multi-dimensional evaluation method specifically designed for patent claims.<n>This research provides the groundwork for more accurate evaluations of automated patent claim generation systems.
arXiv Detail & Related papers (2025-05-16T10:27:16Z)
The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report [170.81876816944754]
The NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR) aims to advance the development of models that optimize key computational metrics. This paper meticulously analyzes these methods and results, emphasizing groundbreaking advancements in state-of-the-art single-image ESR techniques.
arXiv Detail & Related papers (2025-04-14T20:18:21Z)
Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs [67.0310240737424]
We introduce a novel approach to safeguard the ownership of text datasets and effectively detect unauthorized use by the RA-LLMs. Our approach preserves the original data completely unchanged while protecting it by inserting specifically designed canary documents into the IP dataset. During the detection process, unauthorized usage is identified by querying the canary documents and analyzing the responses of RA-LLMs.
arXiv Detail & Related papers (2025-02-15T04:56:45Z)
Can AI Examine Novelty of Patents?: Novelty Evaluation Based on the Correspondence between Patent Claim and Prior Art [5.655276956391884]
This paper introduces a novel challenge by evaluating the ability of large language models (LLMs) to assess patent novelty. We present the first dataset specifically designed for novelty evaluation, derived from real patent examination cases. Our study reveals that while classification models struggle to effectively assess novelty, generative models make predictions with a reasonable level of accuracy.
arXiv Detail & Related papers (2025-02-10T10:09:29Z)
Intelligent System for Automated Molecular Patent Infringement Assessment [38.48937966447085]
PatentFinder is a novel multi-agent and tool-enhanced intelligence system that can accurately and comprehensively evaluate small molecules for patent infringement. PatentFinder features five specialized agents that collaboratively analyze patent claims and molecular structures. PatentFinder autonomously generates detailed and interpretable patent infringement reports, showcasing enhanced accuracy and improved interpretability.
arXiv Detail & Related papers (2024-12-10T12:14:38Z)
CopyrightShield: Spatial Similarity Guided Backdoor Defense against Copyright Infringement in Diffusion Models [61.06621533874629]
diffusion model is a prime target for copyright infringement attacks. This paper provides an in-depth analysis of the spatial similarity of replication in diffusion model. We propose a novel defense method specifically targeting copyright infringement attacks.
arXiv Detail & Related papers (2024-12-02T14:19:44Z)
PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions. We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models. We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z)
Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs) Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally. Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z)
Structural Representation Learning and Disentanglement for Evidential Chinese Patent Approval Prediction [19.287231890434718]
This paper presents the pioneering effort on this task using a retrieval-based classification approach. We propose a novel framework called DiSPat, which focuses on structural representation learning and disentanglement. Our framework surpasses state-of-the-art baselines on patent approval prediction, while also exhibiting enhanced evidentiality.
arXiv Detail & Related papers (2024-08-23T05:44:16Z)
Randomization Techniques to Mitigate the Risk of Copyright Infringement [48.75580082851766]
We investigate potential randomization approaches that can complement current practices for copyright protection. This is motivated by the inherent ambiguity of the rules that determine substantial similarity in copyright precedents. Similar randomized approaches, such as differential privacy, have been successful in mitigating privacy risks.
arXiv Detail & Related papers (2024-08-21T20:55:00Z)
Automated Neural Patent Landscaping in the Small Data Regime [6.284464997330885]
The rapid expansion of patenting activity in recent decades has driven an increasing need for efficient and effective automated patent landscaping approaches. We present an automated neural patent landscaping system that demonstrates significantly improved performance on difficult examples.
arXiv Detail & Related papers (2024-07-10T19:13:37Z)
A Comprehensive Survey on AI-based Methods for Patents [14.090575139188422]
AI-based tools present opportunities to streamline and enhance important tasks in the patent cycle. This interdisciplinary survey aims to serve as a resource for researchers and practitioners working at the intersection of AI and patent analysis.
arXiv Detail & Related papers (2024-04-02T20:44:06Z)
Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain. We propose an adversarial algorithm to make the retriever component robust against distribution shift. We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z)
PaECTER: Patent-level Representation Learning using Citation-informed Transformers [0.16785092703248325]
PaECTER is a publicly available, open-source document-level encoder specific for patents. We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents. PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain.
arXiv Detail & Related papers (2024-02-29T18:09:03Z)
Unveiling Black-boxes: Explainable Deep Learning Models for Patent Classification [48.5140223214582]
State-of-the-art methods for multi-label patent classification rely on deep opaque neural networks (DNNs) We propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP) Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.
arXiv Detail & Related papers (2023-10-31T14:11:37Z)
Towards a Complete Metamorphic Testing Pipeline [56.75969180129005]
Metamorphic Testing (MT) addresses the test oracle problem by examining the relationships between input-output pairs in consecutive executions of the System Under Test (SUT) These relations, known as Metamorphic Relations (MRs), specify the expected output changes resulting from specific input changes. Our research aims to develop methods and tools that assist testers in generating MRs, defining constraints, and providing explainability for MR outcomes.
arXiv Detail & Related papers (2023-09-30T10:49:22Z)
A Survey on Sentence Embedding Models Performance for Patent Analysis [0.0]
We propose a standard library and dataset for assessing the accuracy of embeddings models based on PatentSBERTa approach. Results show PatentSBERTa, Bert-for-patents, and TF-IDF Weighted Word Embeddings have the best accuracy for computing sentence embeddings at the subclass level.
arXiv Detail & Related papers (2022-04-28T12:04:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.