Related papers: Enhancing patent retrieval using automated patent summarization

Enhancing patent retrieval using automated patent summarization

URL: http://arxiv.org/abs/2507.16371v1
Date: Tue, 22 Jul 2025 09:14:44 GMT
Title: Enhancing patent retrieval using automated patent summarization
Authors: Eleni Kamateri, Renukswamy Chikkamath, Michail Salampasis, Linda Andersson, Markus Endres,
Abstract summary: We present the application of recent extractive and abstractive summarization methods for generating concise, purpose-specific summaries of patent documents.<n> Experimental results show that summarization-based queries significantly improve prior-art retrieval effectiveness.
Score: 1.067215284497015
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Effective query formulation is a key challenge in long-document Information Retrieval (IR). This challenge is particularly acute in domain-specific contexts like patent retrieval, where documents are lengthy, linguistically complex, and encompass multiple interrelated technical topics. In this work, we present the application of recent extractive and abstractive summarization methods for generating concise, purpose-specific summaries of patent documents. We further assess the utility of these automatically generated summaries as surrogate queries across three benchmark patent datasets and compare their retrieval performance against conventional approaches that use entire patent sections. Experimental results show that summarization-based queries significantly improve prior-art retrieval effectiveness, highlighting their potential as an efficient alternative to traditional query formulation techniques.

Related papers

PANORAMA: A Dataset and Benchmarks Capturing Decision Trails and Rationales in Patent Examination [44.74519851862391]
We build PANORAMA, a dataset of 8,143 U.S. patent examination records.<n>We decompose the trails into sequential benchmarks that emulate patent professionals' patent review processes.<n>We argue that advancing NLP, including LLMs, in the patent domain requires a deeper understanding of real-world patent examination.
arXiv Detail & Related papers (2025-10-25T03:24:13Z)
Executable Knowledge Graphs for Replicating AI Research [65.41207324831583]
Executable Knowledge Graphs (xKG) is a modular and pluggable knowledge base that automatically integrates technical insights, code snippets, and domain-specific knowledge extracted from scientific literature.<n>Code will released at https://github.com/zjunlp/xKG.
arXiv Detail & Related papers (2025-10-20T17:53:23Z)
One-Prompt Strikes Back: Sparse Mixture of Experts for Prompt-based Continual Learning [52.966712416640085]
We propose SMoPE, a novel framework that integrates the benefits of both task-specific and shared prompt strategies.<n>SMoPE consistently outperforms task-specific prompt methods and achieves performance competitive with state-of-the-art approaches.
arXiv Detail & Related papers (2025-09-29T08:54:58Z)
Efficient Patent Searching Using Graph Transformers [1.024113475677323]
Finding relevant prior art is crucial when deciding whether to file a new patent application or invalidate an existing patent.<n>We present a Graph Transformer-based dense retrieval method for patent searching where each invention is represented by a graph.<n>Our model processes these invention graphs and is trained using prior art citations from patent office examiners as relevance signals.
arXiv Detail & Related papers (2025-08-14T09:53:26Z)
Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval [0.2970959580204573]
Patent images are technical drawings that convey information about a patent's innovation.<n>Current methods neglect patents' hierarchical relationships, such as those defined by the Locarno International Classification system.<n>We introduce a hierarchical multi-positive contrastive loss that leverages the LIC's taxonomy to induce such relations in the retrieval process.
arXiv Detail & Related papers (2025-06-16T13:53:02Z)
Towards Better Evaluation for Generated Patent Claims [0.0]
We introduce Patent-CE, the first comprehensive benchmark for evaluating patent claims.<n>We also propose PatClaimEval, a novel multi-dimensional evaluation method specifically designed for patent claims.<n>This research provides the groundwork for more accurate evaluations of automated patent claim generation systems.
arXiv Detail & Related papers (2025-05-16T10:27:16Z)
A Hybrid Architecture with Efficient Fine Tuning for Abstractive Patent Document Summarization [0.0]
This study proposes a system for efficiently creating abstractive summaries of patent records.<n>The procedure involves leveraging the LexRank graph-based algorithm to retrieve the important sentences from input parent texts.
arXiv Detail & Related papers (2025-03-13T13:30:54Z)
RAPID: Efficient Retrieval-Augmented Long Text Generation with Writing Planning and Information Discovery [69.41989381702858]
Existing methods, such as direct generation and multi-agent discussion, often struggle with issues like hallucinations, topic incoherence, and significant latency.<n>We propose RAPID, an efficient retrieval-augmented long text generation framework.<n>Our work provides a robust and efficient solution to the challenges of automated long-text generation.
arXiv Detail & Related papers (2025-03-02T06:11:29Z)
LLM-based Extraction of Contradictions from Patents [0.0]
This paper goes one step further, as it presents a method to extract TRIZ contradictions from patent texts based on Prompt Engineering. Our results show that "off-the-shelf" GPT-4 is a serious alternative to existing approaches.
arXiv Detail & Related papers (2024-03-21T09:36:36Z)
Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy [52.426623750562335]
We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework. ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts. As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
arXiv Detail & Related papers (2024-03-07T02:34:54Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
Patent Sentiment Analysis to Highlight Patent Paragraphs [0.0]
Given a patent document, identifying distinct semantic annotations is an interesting research aspect. In the process of manual patent analysis, to attain better readability, recognising the semantic information by marking paragraphs is in practice. This work assist patent practitioners in highlighting semantic information automatically and aid to create a sustainable and efficient patent analysis using the aptitude of Machine Learning.
arXiv Detail & Related papers (2021-11-06T13:28:29Z)
iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration [63.272359227081836]
iFacetSum integrates interactive summarization together with faceted search. Fine-grained facets are automatically produced based on cross-document coreference pipelines.
arXiv Detail & Related papers (2021-09-23T20:01:11Z)
Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms. Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time. Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z)
Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval [117.07047313964773]
We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions. Our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers. Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.
arXiv Detail & Related papers (2020-09-27T06:12:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.