Related papers: Large Language Model Informed Patent Image Retrieval

Large Language Model Informed Patent Image Retrieval

URL: http://arxiv.org/abs/2404.19360v1
Date: Tue, 30 Apr 2024 08:45:16 GMT
Title: Large Language Model Informed Patent Image Retrieval
Authors: Hao-Cheng Lo, Jung-Mei Chu, Jieh Hsiang, Chun-Chieh Cho,
Abstract summary: We propose a language-informed, distribution-aware multimodal approach to patent image feature learning. Our proposed method achieves state-of-the-art or comparable performance in image-based patent retrieval with mAP +53.3%, Recall@10 +41.8%, and MRR@10 +51.9%.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In patent prosecution, image-based retrieval systems for identifying similarities between current patent images and prior art are pivotal to ensure the novelty and non-obviousness of patent applications. Despite their growing popularity in recent years, existing attempts, while effective at recognizing images within the same patent, fail to deliver practical value due to their limited generalizability in retrieving relevant prior art. Moreover, this task inherently involves the challenges posed by the abstract visual features of patent images, the skewed distribution of image classifications, and the semantic information of image descriptions. Therefore, we propose a language-informed, distribution-aware multimodal approach to patent image feature learning, which enriches the semantic understanding of patent image by integrating Large Language Models and improves the performance of underrepresented classes with our proposed distribution-aware contrastive losses. Extensive experiments on DeepPatent2 dataset show that our proposed method achieves state-of-the-art or comparable performance in image-based patent retrieval with mAP +53.3%, Recall@10 +41.8%, and MRR@10 +51.9%. Furthermore, through an in-depth user analysis, we explore our model in aiding patent professionals in their image retrieval efforts, highlighting the model's real-world applicability and effectiveness.

Related papers

Hierarchical Multi-Positive Contrastive Learning for Patent Image Retrieval [0.2970959580204573]
Patent images are technical drawings that convey information about a patent's innovation.<n>Current methods neglect patents' hierarchical relationships, such as those defined by the Locarno International Classification system.<n>We introduce a hierarchical multi-positive contrastive loss that leverages the LIC's taxonomy to induce such relations in the retrieval process.
arXiv Detail & Related papers (2025-06-16T13:53:02Z)
PatentMind: A Multi-Aspect Reasoning Graph for Patent Similarity Evaluation [32.272839191711114]
We introduce PatentMind, a novel framework for patent similarity assessment based on a Multi-Aspect Reasoning Graph (MARG)<n>PatentMind decomposes patents into three core dimensions: technical feature, application domain, and claim scope, to compute dimension-specific similarity scores.<n>To support evaluation, we construct PatentSimBench, a human-annotated benchmark comprising 500 patent pairs.
arXiv Detail & Related papers (2025-05-25T22:28:27Z)
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models [62.979954692036685]
We introduce PRSS, which refines the classifier-free guidance approach in diffusion models by integrating prompt re-anchoring and semantic prompt search. Our approach consistently improves the privacy-utility trade-off, establishing a new state-of-the-art.
arXiv Detail & Related papers (2025-04-25T02:51:23Z)
FakeScope: Large Multimodal Expert Model for Transparent AI-Generated Image Forensics [66.14786900470158]
We propose FakeScope, an expert multimodal model (LMM) tailored for AI-generated image forensics. FakeScope identifies AI-synthetic images with high accuracy and provides rich, interpretable, and query-driven forensic insights. FakeScope achieves state-of-the-art performance in both closed-ended and open-ended forensic scenarios.
arXiv Detail & Related papers (2025-03-31T16:12:48Z)
CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models [58.58208005178676]
We propose CopyJudge, an automated copyright infringement identification framework. We employ an abstraction-filtration-comparison test framework with multi-LVLM debate to assess the likelihood of infringement. Based on the judgments, we introduce a general LVLM-based mitigation strategy.
arXiv Detail & Related papers (2025-02-21T08:09:07Z)
KAPPA: A Generic Patent Analysis Framework with Keyphrase-Based Portraits [11.425951419870128]
Keyphrases are ideal candidates for patent portraits due to their brevity, representativeness, and clarity. KaPPA operates in two phases: patent portrait construction and portrait-based analysis. Experiments conducted on real-world patent applications demonstrate that our keyphrase-based portraits effectively capture domain-specific knowledge.
arXiv Detail & Related papers (2025-02-18T17:24:00Z)
Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors [62.63467652611788]
We introduce SEMI-TRUTHS, featuring 27,600 real images, 223,400 masks, and 1,472,700 AI-augmented images. Each augmented image is accompanied by metadata for standardized and targeted evaluation of detector robustness. Our findings suggest that state-of-the-art detectors exhibit varying sensitivities to the types and degrees of perturbations, data distributions, and augmentation methods used.
arXiv Detail & Related papers (2024-11-12T01:17:27Z)
Towards Effective User Attribution for Latent Diffusion Models via Watermark-Informed Blending [54.26862913139299]
We introduce a novel framework Towards Effective user Attribution for latent diffusion models via Watermark-Informed Blending (TEAWIB) TEAWIB incorporates a unique ready-to-use configuration approach that allows seamless integration of user-specific watermarks into generative models. Experiments validate the effectiveness of TEAWIB, showcasing the state-of-the-art performance in perceptual quality and attribution accuracy.
arXiv Detail & Related papers (2024-09-17T07:52:09Z)
Structural Representation Learning and Disentanglement for Evidential Chinese Patent Approval Prediction [19.287231890434718]
This paper presents the pioneering effort on this task using a retrieval-based classification approach. We propose a novel framework called DiSPat, which focuses on structural representation learning and disentanglement. Our framework surpasses state-of-the-art baselines on patent approval prediction, while also exhibiting enhanced evidentiality.
arXiv Detail & Related papers (2024-08-23T05:44:16Z)
Knowledge Fused Recognition: Fusing Hierarchical Knowledge for Image Recognition through Quantitative Relativity Modeling and Deep Metric Learning [18.534970504136254]
We propose a novel deep metric learning based method to fuse hierarchical prior knowledge about image classes. Existing deep metric learning incorporated image classification mainly exploits qualitative relativity between image classes. A new triplet loss function term that exploits quantitative relativity and aligns distances in model latent space with those in knowledge space is also proposed and incorporated in the proposed dual-modality fusion method.
arXiv Detail & Related papers (2024-07-30T07:24:33Z)
A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models [52.49582606341111]
Copyright law confers creators the exclusive rights to reproduce, distribute, and monetize their creative works. Recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement. We introduce a novel pipeline that harmonizes CLIP, ChatGPT, and diffusion models to curate a dataset.
arXiv Detail & Related papers (2024-01-04T11:14:01Z)
Learning Efficient Representations for Image-Based Patent Retrieval [16.323708969088557]
We present a simple and lightweight model for content-based patent retrieval. Our approach significantly outperforms other counterparts on a large-scale benchmark. Our model can be elaborately scaled up to achieve a surprisingly high mAP of 93.5%.
arXiv Detail & Related papers (2023-08-26T03:19:14Z)
Classification of Visualization Types and Perspectives in Patents [9.123089032348311]
We adopt state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images. We derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives.
arXiv Detail & Related papers (2023-07-19T21:45:07Z)
Learning from Multi-Perception Features for Real-Word Image Super-resolution [87.71135803794519]
We propose a novel SR method called MPF-Net that leverages multiple perceptual features of input images. Our method incorporates a Multi-Perception Feature Extraction (MPFE) module to extract diverse perceptual information. We also introduce a contrastive regularization term (CR) that improves the model's learning capability.
arXiv Detail & Related papers (2023-05-26T07:35:49Z)
A Survey on Sentence Embedding Models Performance for Patent Analysis [0.0]
We propose a standard library and dataset for assessing the accuracy of embeddings models based on PatentSBERTa approach. Results show PatentSBERTa, Bert-for-patents, and TF-IDF Weighted Word Embeddings have the best accuracy for computing sentence embeddings at the subclass level.
arXiv Detail & Related papers (2022-04-28T12:04:42Z)
Proactive Pseudo-Intervention: Causally Informed Contrastive Learning For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI) PPI leverages proactive interventions to guard against image features with no causal relevance. We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z)
Joint Deep Learning of Facial Expression Synthesis and Recognition [97.19528464266824]
We propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER. The proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions. In order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm.
arXiv Detail & Related papers (2020-02-06T10:56:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.