Learning Discriminative Visual-Text Representation for Polyp
Re-Identification
- URL: http://arxiv.org/abs/2307.10625v1
- Date: Thu, 20 Jul 2023 06:47:46 GMT
- Title: Learning Discriminative Visual-Text Representation for Polyp
Re-Identification
- Authors: Suncheng Xiang, Cang Liu, Sijia Du, Dahong Qian
- Abstract summary: Colonoscopic Polyp Re-Identification aims to match a specific polyp in a large gallery with different cameras and views.
Traditional methods mainly focus on the visual representation learning, while neglect to explore the potential of semantic features during training.
We propose VT-ReID, which can remarkably enrich the representation of polyp videos with the interchange of high-level semantic information.
- Score: 3.4269112703886955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Colonoscopic Polyp Re-Identification aims to match a specific polyp in a
large gallery with different cameras and views, which plays a key role for the
prevention and treatment of colorectal cancer in the computer-aided diagnosis.
However, traditional methods mainly focus on the visual representation
learning, while neglect to explore the potential of semantic features during
training, which may easily leads to poor generalization capability when adapted
the pretrained model into the new scenarios. To relieve this dilemma, we
propose a simple but effective training method named VT-ReID, which can
remarkably enrich the representation of polyp videos with the interchange of
high-level semantic information. Moreover, we elaborately design a novel
clustering mechanism to introduce prior knowledge from textual data, which
leverages contrastive learning to promote better separation from abundant
unlabeled text data. To the best of our knowledge, this is the first attempt to
employ the visual-text feature with clustering mechanism for the colonoscopic
polyp re-identification. Empirical results show that our method significantly
outperforms current state-of-the art methods with a clear margin.
Related papers
- Predicting Stroke through Retinal Graphs and Multimodal Self-supervised Learning [0.46835339362676565]
Early identification of stroke is crucial for intervention, requiring reliable models.
We proposed an efficient retinal image representation together with clinical information to capture a comprehensive overview of cardiovascular health.
arXiv Detail & Related papers (2024-11-08T14:40:56Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Towards Discriminative Representation with Meta-learning for
Colonoscopic Polyp Re-Identification [2.78481408391119]
Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras.
Traditional methods for object ReID directly adopting CNN models trained on the ImageNet dataset produce unsatisfactory retrieval performance.
We propose a simple but effective training method named Colo-ReID, which can help our model learn more general and discriminative knowledge.
arXiv Detail & Related papers (2023-08-02T04:10:14Z) - Few-Shot Classification of Skin Lesions from Dermoscopic Images by
Meta-Learning Representative Embeddings [1.957558771641347]
Annotated images and ground truth for diagnosis of rare and novel diseases are scarce.
Few-shot learning, and meta-learning in general, aim to overcome these issues by aiming to perform well in low data regimes.
This paper focuses on improving meta-learning for the classification of dermoscopic images.
arXiv Detail & Related papers (2022-10-30T21:27:15Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Unsupervised Representation Learning from Pathology Images with
Multi-directional Contrastive Predictive Coding [0.33148826359547523]
We present a modification to the CPC framework for use with digital pathology patches.
This is achieved by introducing an alternative mask for building the latent context.
We show that our proposed modification can yield improved deep classification of histology patches.
arXiv Detail & Related papers (2021-05-11T21:17:13Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z) - Heterogeneous Contrastive Learning: Encoding Spatial Information for
Compact Visual Representations [183.03278932562438]
This paper presents an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations.
We show that our approach achieves higher efficiency in visual representations and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
arXiv Detail & Related papers (2020-11-19T16:26:25Z) - Joint Deep Learning of Facial Expression Synthesis and Recognition [97.19528464266824]
We propose a novel joint deep learning of facial expression synthesis and recognition method for effective FER.
The proposed method involves a two-stage learning procedure. Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions.
In order to alleviate the problem of data bias between the real images and the synthetic images, we propose an intra-class loss with a novel real data-guided back-propagation (RDBP) algorithm.
arXiv Detail & Related papers (2020-02-06T10:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.