Related papers: Comparing Complex Concepts with Transformers: Matching Patent Claims Against Natural Language Text

Comparing Complex Concepts with Transformers: Matching Patent Claims Against Natural Language Text

URL: http://arxiv.org/abs/2407.10351v1
Date: Sun, 14 Jul 2024 22:31:07 GMT
Title: Comparing Complex Concepts with Transformers: Matching Patent Claims Against Natural Language Text
Authors: Matthias Blume, Ghobad Heidari, Christoph Hewel,
Abstract summary: Key capability in managing patent applications or a patent portfolio is comparing claims to other text, e.g. a patent specification. We test two new LLM-based approaches and find that both provide substantially better performance than previously published values. The ability to match dense information from one domain against much more distributed information expressed in a different vocabulary may also be useful beyond the intellectual property space.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A key capability in managing patent applications or a patent portfolio is comparing claims to other text, e.g. a patent specification. Because the language of claims is different from language used elsewhere in the patent application or in non-patent text, this has been challenging for computer based natural language processing. We test two new LLM-based approaches and find that both provide substantially better performance than previously published values. The ability to match dense information from one domain against much more distributed information expressed in a different vocabulary may also be useful beyond the intellectual property space.

Related papers

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages [93.92804151830744]
We present BRIGHTER, a collection of multi-labeled, emotion-annotated datasets in 28 different languages.<n>We highlight the challenges related to the data collection and annotation processes.<n>We show that the BRIGHTER datasets represent a meaningful step towards addressing the gap in text-based emotion recognition.
arXiv Detail & Related papers (2025-02-17T15:39:50Z)
PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions. We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models. We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z)
LLM-based Translation Inference with Iterative Bilingual Understanding [52.46978502902928]
We propose a novel Iterative Bilingual Understanding Translation method based on the cross-lingual capabilities of large language models (LLMs) The cross-lingual capability of LLMs enables the generation of contextual understanding for both the source and target languages separately. The proposed IBUT outperforms several strong comparison methods.
arXiv Detail & Related papers (2024-10-16T13:21:46Z)
Pap2Pat: Benchmarking Outline-Guided Long-Text Patent Generation with Patent-Paper Pairs [13.242188189150987]
We build PAP2PAT, an open benchmark for patent drafting consisting of 1.8k patent-paper pairs describing the same inventions. Our evaluation using PAP2PAT and a human case study show that LLMs can effectively leverage information from the paper, but still struggle to provide the necessary level of detail.
arXiv Detail & Related papers (2024-10-09T15:52:48Z)
Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs [18.86788223751979]
We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases. We introduce a graph-augmented approach to amplify the global contextual information of the patent phrases.
arXiv Detail & Related papers (2024-03-24T18:59:38Z)
Natural Language Processing in Patents: A Survey [0.0]
Patents, encapsulating crucial technical and legal information, present a rich domain for natural language processing (NLP) applications. As NLP technologies evolve, large language models (LLMs) have demonstrated outstanding capabilities in general text processing and generation tasks. This paper aims to equip NLP researchers with the essential knowledge to navigate this complex domain efficiently.
arXiv Detail & Related papers (2024-03-06T23:17:16Z)
PaECTER: Patent-level Representation Learning using Citation-informed Transformers [0.16785092703248325]
PaECTER is a publicly available, open-source document-level encoder specific for patents. We fine-tune BERT for Patents with examiner-added citation information to generate numerical representations for patent documents. PaECTER performs better in similarity tasks than current state-of-the-art models used in the patent domain.
arXiv Detail & Related papers (2024-02-29T18:09:03Z)
UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset [69.33424532827608]
Open-source large language models (LLMs) have gained significant strength across diverse fields. In this work, we construct an open-source multilingual supervised fine-tuning dataset. The resulting UltraLink dataset comprises approximately 1 million samples across five languages.
arXiv Detail & Related papers (2024-02-07T05:05:53Z)
Unveiling Black-boxes: Explainable Deep Learning Models for Patent Classification [48.5140223214582]
State-of-the-art methods for multi-label patent classification rely on deep opaque neural networks (DNNs) We propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP) Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.
arXiv Detail & Related papers (2023-10-31T14:11:37Z)
A Novel Patent Similarity Measurement Methodology: Semantic Distance and Technological Distance [0.0]
Patent similarity analysis plays a crucial role in evaluating the risk of patent infringement. Recent advances in natural language processing technology offer a promising avenue for automating this process. We propose a hybrid methodology that takes into account similarity, measures the similarity between patents by considering the semantic similarity of patents.
arXiv Detail & Related papers (2023-03-23T07:55:31Z)
Enriching Relation Extraction with OpenIE [70.52564277675056]
Relation extraction (RE) is a sub-discipline of information extraction (IE) In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE. Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models.
arXiv Detail & Related papers (2022-12-19T11:26:23Z)
CLIP also Understands Text: Prompting CLIP for Phrase Understanding [65.59857372525664]
Contrastive Language-Image Pretraining (CLIP) efficiently learns visual concepts by pre-training with natural language supervision. In this paper, we find that the text encoder of CLIP actually demonstrates strong ability for phrase understanding, and can even significantly outperform popular language models such as BERT with a properly designed prompt.
arXiv Detail & Related papers (2022-10-11T23:35:18Z)
Linguistically Informed Masking for Representation Learning in the Patent Domain [7.911344873839031]
We propose the empirically motivated Linguistically Informed Masking (LIM) method to focus domain-adaptative pre-training on the linguistic patterns of patents. We quantify the relevant differences between patent, scientific and general-purpose language. We demonstrate the impact of balancing the learning from different information sources during domain adaptation for the patent domain.
arXiv Detail & Related papers (2021-06-10T14:20:57Z)
X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models [103.75890012041366]
Language models (LMs) have proven surprisingly successful at capturing factual knowledge. However, studies on LMs' factual representation ability have almost invariably been performed on English. We create a benchmark of cloze-style probes for 23 typologically diverse languages.
arXiv Detail & Related papers (2020-10-13T05:29:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.