Related papers: PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims

PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims

URL: http://arxiv.org/abs/2505.19345v1
Date: Sun, 25 May 2025 22:20:11 GMT
Title: PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims
Authors: Yongmin Yoo, Qiongkai Xu, Longbing Cao,
Abstract summary: We introduce PatentScore, a multi-dimensional evaluation framework for assessing LLM-generated patent claims.<n>Unlike general-purpose NLG metrics, PatentScore reflects patent-specific constraints and document structures, enabling evaluation beyond surface similarity.<n>We report a Pearson correlation of $r = 0.819$ with expert annotations, outperforming existing NLG metrics.
Score: 32.272839191711114
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Natural language generation (NLG) metrics play a central role in evaluating generated texts, but are not well suited for the structural and legal characteristics of patent documents. Large language models (LLMs) offer strong potential in automating patent generation, yet research on evaluating LLM-generated patents remains limited, especially in evaluating the generation quality of patent claims, which are central to defining the scope of protection. Effective claim evaluation requires addressing legal validity, technical accuracy, and structural compliance. To address this gap, we introduce PatentScore, a multi-dimensional evaluation framework for assessing LLM-generated patent claims. PatentScore incorporates: (1) hierarchical decomposition for claim analysis; (2) domain-specific validation patterns based on legal and technical standards; and (3) scoring across structural, semantic, and legal dimensions. Unlike general-purpose NLG metrics, PatentScore reflects patent-specific constraints and document structures, enabling evaluation beyond surface similarity. We evaluate 400 GPT-4o-mini generated Claim 1s and report a Pearson correlation of $r = 0.819$ with expert annotations, outperforming existing NLG metrics. Furthermore, we conduct additional evaluations using open models such as Claude-3.5-Haiku and Gemini-1.5-flash, all of which show strong correlations with expert judgments, confirming the robustness and generalizability of our framework.

Related papers

PATENTWRITER: A Benchmarking Study for Patent Drafting with LLMs [2.9141392786940057]
This paper aims for a paradigm shift for patent writing by leveraging large language models (LLMs) to overcome the tedious patent-filing process.<n>We present PATENTWRITER, the first unified benchmarking framework for evaluating LLMs in patent abstract generation.
arXiv Detail & Related papers (2025-07-30T05:17:35Z)
PatentMind: A Multi-Aspect Reasoning Graph for Patent Similarity Evaluation [32.272839191711114]
We introduce PatentMind, a novel framework for patent similarity assessment based on a Multi-Aspect Reasoning Graph (MARG)<n>PatentMind decomposes patents into three core dimensions: technical feature, application domain, and claim scope, to compute dimension-specific similarity scores.<n>To support evaluation, we construct PatentSimBench, a human-annotated benchmark comprising 500 patent pairs.
arXiv Detail & Related papers (2025-05-25T22:28:27Z)
Towards Better Evaluation for Generated Patent Claims [0.0]
We introduce Patent-CE, the first comprehensive benchmark for evaluating patent claims.<n>We also propose PatClaimEval, a novel multi-dimensional evaluation method specifically designed for patent claims.<n>This research provides the groundwork for more accurate evaluations of automated patent claim generation systems.
arXiv Detail & Related papers (2025-05-16T10:27:16Z)
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework [61.38174427966444]
Large Language Models (LLMs) are being used more and more extensively for automated evaluation in various scenarios.<n>Previous studies have attempted to fine-tune open-source LLMs to replicate the evaluation explanations and judgments of powerful proprietary models.<n>We propose a novel evaluation framework, ARJudge, that adaptively formulates evaluation criteria and synthesizes both text-based and code-driven analyses.
arXiv Detail & Related papers (2025-02-26T06:31:45Z)
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.<n>The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z)
Can AI Examine Novelty of Patents?: Novelty Evaluation Based on the Correspondence between Patent Claim and Prior Art [5.655276956391884]
This paper introduces a novel challenge by evaluating the ability of large language models (LLMs) to assess patent novelty.<n>We present the first dataset specifically designed for novelty evaluation, derived from real patent examination cases.<n>Our study reveals that while classification models struggle to effectively assess novelty, generative models make predictions with a reasonable level of accuracy.
arXiv Detail & Related papers (2025-02-10T10:09:29Z)
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain [62.89809156574998]
We introduce an omnidirectional and automatic RAG benchmark, OmniEval, in the financial domain.<n>Our benchmark is characterized by its multi-dimensional evaluation framework.<n>Our experiments demonstrate the comprehensiveness of OmniEval, which includes extensive test datasets.
arXiv Detail & Related papers (2024-12-17T15:38:42Z)
Patent-CR: A Dataset for Patent Claim Revision [0.0]
This paper presents Patent-CR, the first dataset created for the patent claim revision task in English.<n>It includes both initial patent applications rejected by patent examiners and the final granted versions.
arXiv Detail & Related papers (2024-12-03T16:43:42Z)
PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions. We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models. We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z)
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges [57.88520765782177]
Large Language Models (LLMs) have opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to understand and compare these methods. By discussing unresolved challenges, including bias, robustness, domain-specificity, and unified evaluation, this paper seeks to offer insights to researchers and advocate for fairer and more advanced NLG evaluation techniques.
arXiv Detail & Related papers (2024-01-13T15:59:09Z)
Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification [26.85734804493925]
We propose an integrated framework that comprehensively considers the information on patents for patent classification. We first present an IPC codes correlations learning module to derive their semantic representations. Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions.
arXiv Detail & Related papers (2023-08-10T07:02:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.