AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
- URL: http://arxiv.org/abs/2410.16820v1
- Date: Tue, 22 Oct 2024 08:48:41 GMT
- Title: AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
- Authors: Yongjian Wu, Yang Zhou, Jiya Saiyin, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yan Xu,
- Abstract summary: This paper investigates the potential of object-level visual-language pre-training for zero-shot nuclei detection.
We propose an innovative auto-prompting pipeline, named AttriPrompter, to avoid subjective manual prompt design.
Our method exhibits remarkable performance in label-free nuclei detection, outperforming all existing unsupervised methods.
- Score: 20.071868757497278
- License:
- Abstract: Large-scale visual-language pre-trained models (VLPMs) have demonstrated exceptional performance in downstream object detection through text prompts for natural scenes. However, their application to zero-shot nuclei detection on histopathology images remains relatively unexplored, mainly due to the significant gap between the characteristics of medical images and the web-originated text-image pairs used for pre-training. This paper aims to investigate the potential of the object-level VLPM, Grounded Language-Image Pre-training (GLIP), for zero-shot nuclei detection. Specifically, we propose an innovative auto-prompting pipeline, named AttriPrompter, comprising attribute generation, attribute augmentation, and relevance sorting, to avoid subjective manual prompt design. AttriPrompter utilizes VLPMs' text-to-image alignment to create semantically rich text prompts, which are then fed into GLIP for initial zero-shot nuclei detection. Additionally, we propose a self-trained knowledge distillation framework, where GLIP serves as the teacher with its initial predictions used as pseudo labels, to address the challenges posed by high nuclei density, including missed detections, false positives, and overlapping instances. Our method exhibits remarkable performance in label-free nuclei detection, outperforming all existing unsupervised methods and demonstrating excellent generality. Notably, this work highlights the astonishing potential of VLPMs pre-trained on natural image-text pairs for downstream tasks in the medical field as well. Code will be released at https://github.com/wuyongjianCODE/AttriPrompter.
Related papers
- Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification [3.1029532920699934]
We introduce a novel prompt generation approach in by text generation in natural language processing (NLP)
Our method, named Pseudo-Prompt Generating (PsPG), capitalizes on the priori knowledge of multi-modal features.
Features a RNN-based decoder, PsPG autoregressively generates class-tailored embedding vectors, i.e., pseudo-prompts.
arXiv Detail & Related papers (2024-05-10T13:27:32Z) - LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained
Descriptors [58.75140338866403]
DVDet is a Descriptor-Enhanced Open Vocabulary Detector.
It transforms regional embeddings into image-like representations that can be directly integrated into general open vocabulary detection training.
Extensive experiments over multiple large-scale benchmarks show that DVDet outperforms the state-of-the-art consistently by large margins.
arXiv Detail & Related papers (2024-02-07T07:26:49Z) - Zero-shot Nuclei Detection via Visual-Language Pre-trained Models [19.526504045149895]
Large-scale visual-language pre-trained models have proven their excellent performance in downstream object detection for natural scenes.
However, zero-shot nuclei detection on H&E images via VLPMs remains underexplored.
In this paper, we attempt to explore the potential of the object-level VLPM, Grounded Language-Image Pre-training model, for zero-shot nuclei detection.
arXiv Detail & Related papers (2023-06-30T13:44:13Z) - DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection [56.513637720967566]
Large language models (LLMs) can generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets.
Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics.
We propose to extract deep intrinsic characteristics of the black-box model generated texts.
arXiv Detail & Related papers (2023-05-21T17:26:16Z) - Unleashing Text-to-Image Diffusion Models for Visual Perception [84.41514649568094]
VPD (Visual Perception with a pre-trained diffusion model) is a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks.
We show that VPD can be faster adapted to downstream visual perception tasks using the proposed VPD.
arXiv Detail & Related papers (2023-03-03T18:59:47Z) - Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary
Object Detection [87.39089806069707]
We propose a fine-grained Visual-Text Prompt-driven self-training paradigm for Open-Vocabulary Detection (VTP-OVD)
During the adapting stage, we enable VLM to obtain fine-grained alignment by using learnable text prompts to resolve an auxiliary dense pixel-wise prediction task.
Experiments show that our method achieves the state-of-the-art performance for open-vocabulary object detection, e.g., 31.5% mAP on unseen classes of COCO.
arXiv Detail & Related papers (2022-11-02T03:38:02Z) - PromptDet: Expand Your Detector Vocabulary with Uncurated Images [47.600059694034]
The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.
We propose a two-stage open-vocabulary object detector that categorises each box proposal by a classifier generated from the text encoder of a pre-trained visual-language model.
To scale up the learning procedure towards detecting a wider spectrum of objects, we exploit the available online resource, iteratively updating the prompts, and later self-training the proposed detector with pseudo labels generated on a large corpus of noisy, uncurated web images.
arXiv Detail & Related papers (2022-03-30T17:50:21Z) - Learning to Prompt for Open-Vocabulary Object Detection with
Vision-Language Model [34.85604521903056]
We introduce a novel method, detection prompt (DetPro), to learn continuous prompt representations for open-vocabulary object detection.
We assemble DetPro with ViLD, a recent state-of-the-art open-world object detector.
Experimental results show that our DetPro outperforms the baseline ViLD in all settings.
arXiv Detail & Related papers (2022-03-28T17:50:26Z) - Open-Vocabulary DETR with Conditional Matching [86.1530128487077]
OV-DETR is an open-vocabulary detector based on DETR.
It can detect any object given its class name or an exemplar image.
It achieves non-trivial improvements over current state of the arts.
arXiv Detail & Related papers (2022-03-22T16:54:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.