Identification, Tracking and Impact: Understanding the trade secret of
catchphrases
- URL: http://arxiv.org/abs/2007.13520v1
- Date: Mon, 20 Jul 2020 06:11:25 GMT
- Title: Identification, Tracking and Impact: Understanding the trade secret of
catchphrases
- Authors: Jagriti Jalal, Mayank Singh, Arindam Pal, Lipika Dey, Animesh
Mukherjee
- Abstract summary: We propose an unsupervised method for the extraction of catchphrases from the abstracts of patents granted by the U.S. Patent and Trademark Office.
Our proposed system achieves substantial improvement, both in terms of precision and recall, against state-of-the-art techniques.
- Score: 8.343482692350094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the topical evolution in industrial innovation is a challenging
problem. With the advancement in the digital repositories in the form of patent
documents, it is becoming increasingly more feasible to understand the
innovation secrets -- "catchphrases" of organizations. However, searching and
understanding this enormous textual information is a natural bottleneck. In
this paper, we propose an unsupervised method for the extraction of
catchphrases from the abstracts of patents granted by the U.S. Patent and
Trademark Office over the years. Our proposed system achieves substantial
improvement, both in terms of precision and recall, against state-of-the-art
techniques. As a second objective, we conduct an extensive empirical study to
understand the temporal evolution of the catchphrases across various
organizations. We also show how the overall innovation evolution in the form of
introduction of newer catchphrases in an organization's patents correlates with
the future citations received by the patents filed by that organization. Our
code and data sets will be placed in the public domain soon.
Related papers
- PatentEdits: Framing Patent Novelty as Textual Entailment [62.8514393375952]
We introduce the PatentEdits dataset, which contains 105K examples of successful revisions.
We design algorithms to label edits sentence by sentence, then establish how well these edits can be predicted with large language models.
We demonstrate that evaluating textual entailment between cited references and draft sentences is especially effective in predicting which inventive claims remained unchanged or are novel in relation to prior art.
arXiv Detail & Related papers (2024-11-20T17:23:40Z) - Large Language Model for Patent Concept Generation [2.4368308736427697]
Existing large language models (LLMs) often fall short in the innovative concept generation due to a lack of specialized knowledge.
We propose a novel knowledge finetuning (KFT) framework to endow LLM-based AI with the ability to autonomously mine, understand, and apply domain-specific knowledge.
Our proposed PatentGPT integrates knowledge injection pre-training (KPT), domain-specific supervised finetuning (SFT), and reinforcement learning from human feedback.
arXiv Detail & Related papers (2024-08-26T12:00:29Z) - Detecting, Explaining, and Mitigating Memorization in Diffusion Models [49.438362005962375]
We introduce a straightforward yet effective method for detecting memorized prompts by inspecting the magnitude of text-conditional predictions.
Our proposed method seamlessly integrates without disrupting sampling algorithms, and delivers high accuracy even at the first generation step.
Building on our detection strategy, we unveil an explainable approach that shows the contribution of individual words or tokens to memorization.
arXiv Detail & Related papers (2024-07-31T16:13:29Z) - Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs [18.86788223751979]
We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases.
We introduce a graph-augmented approach to amplify the global contextual information of the patent phrases.
arXiv Detail & Related papers (2024-03-24T18:59:38Z) - LLM-based Extraction of Contradictions from Patents [0.0]
This paper goes one step further, as it presents a method to extract TRIZ contradictions from patent texts based on Prompt Engineering.
Our results show that "off-the-shelf" GPT-4 is a serious alternative to existing approaches.
arXiv Detail & Related papers (2024-03-21T09:36:36Z) - Natural Language Processing in Patents: A Survey [0.0]
Patents, encapsulating crucial technical and legal information, present a rich domain for natural language processing (NLP) applications.
As NLP technologies evolve, large language models (LLMs) have demonstrated outstanding capabilities in general text processing and generation tasks.
This paper aims to equip NLP researchers with the essential knowledge to navigate this complex domain efficiently.
arXiv Detail & Related papers (2024-03-06T23:17:16Z) - Unveiling Black-boxes: Explainable Deep Learning Models for Patent
Classification [48.5140223214582]
State-of-the-art methods for multi-label patent classification rely on deep opaque neural networks (DNNs)
We propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP)
Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.
arXiv Detail & Related papers (2023-10-31T14:11:37Z) - Creating a silver standard for patent simplification [11.083371480030195]
Patents are legal documents that aim at protecting inventions on the one hand and at making technical knowledge circulate on the other.
Their style -- a mix of legal, technical, and extremely vague language -- makes their content hard to access for humans and machines.
This paper proposes an approach to automatically simplify patent text through rephrasing.
arXiv Detail & Related papers (2023-10-24T10:00:56Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and
Multi-Purpose Corpus of Patent Applications [8.110699646062384]
We introduce the Harvard USPTO Patent dataset (HUPD)
With more than 4.5 million patent documents, HUPD is two to three times larger than comparable corpora.
By providing each application's metadata along with all of its text fields, the dataset enables researchers to perform new sets of NLP tasks.
arXiv Detail & Related papers (2022-07-08T17:57:15Z) - KAT: A Knowledge Augmented Transformer for Vision-and-Language [56.716531169609915]
We propose a novel model - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result on the open-domain multimodal task of OK-VQA.
Our approach integrates implicit and explicit knowledge in an end to end encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation.
An additional benefit of explicit knowledge integration is seen in improved interpretability of model predictions in our analysis.
arXiv Detail & Related papers (2021-12-16T04:37:10Z) - SmartPatch: Improving Handwritten Word Imitation with Patch
Discriminators [67.54204685189255]
We propose SmartPatch, a new technique increasing the performance of current state-of-the-art methods.
We combine the well-known patch loss with information gathered from the parallel trained handwritten text recognition system.
This leads to a more enhanced local discriminator and results in more realistic and higher-quality generated handwritten words.
arXiv Detail & Related papers (2021-05-21T18:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.