Related papers: Weakly supervised cross-modal learning in high-content screening

Weakly supervised cross-modal learning in high-content screening

URL: http://arxiv.org/abs/2311.04678v2
Date: Sun, 12 Nov 2023 13:31:14 GMT
Title: Weakly supervised cross-modal learning in high-content screening
Authors: Watkinson Gabriel and Cohen Ethan and Bourriez Nicolas and Bendidi Ihab and Bollot Guillaume and Genovesio Auguste
Abstract summary: We introduce a novel approach to learn cross-modal representations between image data and molecular representations for drug discovery. We propose EMM and IMM, two innovative loss functions built on top of CLIP that leverage weak supervision. We also present a preprocessing method for the JUMP-CP dataset that effectively reduce the required space from 85Tb to a mere usable 7Tb size.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the surge in available data from various modalities, there is a growing need to bridge the gap between different data types. In this work, we introduce a novel approach to learn cross-modal representations between image data and molecular representations for drug discovery. We propose EMM and IMM, two innovative loss functions built on top of CLIP that leverage weak supervision and cross sites replicates in High-Content Screening. Evaluating our model against known baseline on cross-modal retrieval, we show that our proposed approach allows to learn better representations and mitigate batch effect. In addition, we also present a preprocessing method for the JUMP-CP dataset that effectively reduce the required space from 85Tb to a mere usable 7Tb size, still retaining all perturbations and most of the information content.

Related papers

Generalized Contrastive Learning for Universal Multimodal Retrieval [53.70202081784898]
Cross-modal retrieval models (e.g., CLIP) show degraded performances with retrieving keys composed of fused image-text modality.<n>This paper proposes Generalized Contrastive Learning (GCL), a novel loss formulation that improves multimodal retrieval performance without the need for new dataset curation.
arXiv Detail & Related papers (2025-09-30T01:25:04Z)
Multimodal Task Representation Memory Bank vs. Catastrophic Forgetting in Anomaly Detection [6.991692485111346]
Unsupervised Continuous Anomaly Detection (UCAD) faces significant challenges in multi-task representation learning. We propose the Multimodal Task Representation Memory Bank (MTRMB) method through two key technical innovations. Experiments on MVtec AD and VisA datasets demonstrate MTRMB's superiority, achieving an average detection accuracy of 0.921 at the lowest forgetting rate.
arXiv Detail & Related papers (2025-02-10T06:49:54Z)
USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation [24.90512145836643]
We introduce a Unified Skeleton-based Dense Representation Learning framework based on feature decorrelation. We show that our approach significantly outperforms the current state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2024-12-12T12:20:27Z)
Adaptive Affinity-Based Generalization For MRI Imaging Segmentation Across Resource-Limited Settings [1.5703963908242198]
This paper introduces a novel relation-based knowledge framework by seamlessly combining adaptive affinity-based and kernel-based distillation. To validate our innovative approach, we conducted experiments on publicly available multi-source prostate MRI data.
arXiv Detail & Related papers (2024-04-03T13:35:51Z)
Masked Contrastive Reconstruction for Cross-modal Medical Image-Report Retrieval [3.5314225883644945]
Cross-modal medical image-report retrieval task plays a significant role in clinical diagnosis and various medical generative tasks. We propose an efficient framework named Masked Contrastive and Reconstruction (MCR), which takes masked data as the sole input for both tasks. This enhances task connections, reducing information interference and competition between them, while also substantially decreasing the required GPU memory and training time.
arXiv Detail & Related papers (2023-12-26T01:14:10Z)
Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images. We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy. Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z)
Image-level supervision and self-training for transformer-based cross-modality tumor segmentation [2.29206349318258]
We propose a new semi-supervised training strategy called MoDATTS. MoDATTS is designed for accurate cross-modality 3D tumor segmentation on unpaired bi-modal datasets. We report that 99% and 100% of this maximum performance can be attained if 20% and 50% of the target data is annotated.
arXiv Detail & Related papers (2023-09-17T11:50:12Z)
Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers [58.802352477207094]
We explore the great potential of a pre-trained vision Transformer (ViT) to bridge the vast distribution gap between two modalities. We propose a mask modeling strategy that randomly masks a specific modality of some tokens to enforce the interaction between tokens from different modalities interacting proactively. Experiments demonstrate that our plug-and-play training augmentation techniques can significantly boost state-of-the-art one-stream and two trackersstream to a large extent in terms of both tracking precision and success rate.
arXiv Detail & Related papers (2023-07-09T08:58:47Z)
MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL) MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space. Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z)
CLIP-Driven Fine-grained Text-Image Person Re-identification [50.94827165464813]
TIReID aims to retrieve the image corresponding to the given text query from a pool of candidate images. We propose a CLIP-driven Fine-grained information excavation framework (CFine) to fully utilize the powerful knowledge of CLIP for TIReID.
arXiv Detail & Related papers (2022-10-19T03:43:12Z)
Multi-Modal Mutual Information Maximization: A Novel Approach for Unsupervised Deep Cross-Modal Hashing [73.29587731448345]
We propose a novel method, dubbed Cross-Modal Info-Max Hashing (CMIMH) We learn informative representations that can preserve both intra- and inter-modal similarities. The proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
arXiv Detail & Related papers (2021-12-13T08:58:03Z)
Learning Multimodal VAEs through Mutual Supervision [72.77685889312889]
MEME combines information between modalities implicitly through mutual supervision. We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes.
arXiv Detail & Related papers (2021-06-23T17:54:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.