Related papers: Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision

Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision

URL: http://arxiv.org/abs/2206.14719v1
Date: Wed, 29 Jun 2022 15:37:11 GMT
Title: Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision
Authors: Zifeng Wang and Jimeng Sun
Abstract summary: We propose Trial2Vec, which learns through self-supervision without annotating similar clinical trials. meta-structure of trial documents (e.g., title, eligibility criteria, target disease) along with clinical knowledge are leveraged to automatically generate contrastive samples. We show that our method yields medically interpretable embeddings by visualization and it gets a 15% average improvement over the best baselines on precision/recall for trial retrieval.
Score: 42.859662256134584
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clinical trials are essential for drug development but are extremely expensive and time-consuming to conduct. It is beneficial to study similar historical trials when designing a clinical trial. However, lengthy trial documents and lack of labeled data make trial similarity search difficult. We propose a zero-shot clinical trial retrieval method, Trial2Vec, which learns through self-supervision without annotating similar clinical trials. Specifically, the meta-structure of trial documents (e.g., title, eligibility criteria, target disease) along with clinical knowledge (e.g., UMLS knowledge base https://www.nlm.nih.gov/research/umls/index.html) are leveraged to automatically generate contrastive samples. Besides, Trial2Vec encodes trial documents considering meta-structure thus producing compact embeddings aggregating multi-aspect information from the whole document. We show that our method yields medically interpretable embeddings by visualization and it gets a 15% average improvement over the best baselines on precision/recall for trial retrieval, which is evaluated on our labeled 1600 trial pairs. In addition, we prove the pre-trained embeddings benefit the downstream trial outcome prediction task over 240k trials.

Related papers

TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [57.067409211231244]
This paper presents meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design. We provide basic validation methods for each task to ensure the datasets' usability and reliability. We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z)
Panacea: A foundation model for clinical trial search, summarization, design, and recruitment [29.099676641424384]
We propose a clinical trial foundation model named Panacea. Panacea is designed to handle multiple tasks, including trial search, trial summarization, trial design, and patient-trial matching. We also assemble a large-scale dataset, named TrialAlign, of 793,279 trial documents and 1,113,207 trial-related scientific papers.
arXiv Detail & Related papers (2024-06-25T21:29:25Z)
Automatically Labeling Clinical Trial Outcomes: A Large-Scale Benchmark for Drug Development [24.663798850232588]
Clinical Trial Outcome (CTO) benchmark is a fully reproducible, large-scale repository encompassing approximately 125,000 drug and biologics trials. We manually annotated a dataset of clinical trials conducted between 2020 and 2024 to enhance the quality and reliability of outcome labels.
arXiv Detail & Related papers (2024-06-13T04:23:35Z)
TrialDura: Hierarchical Attention Transformer for Interpretable Clinical Trial Duration Prediction [19.084936647082632]
We propose TrialDura, a machine learning-based method that estimates the duration of clinical trials using multimodal data. We encode them into Bio-BERT embeddings specifically tuned for biomedical contexts to provide a deeper and more relevant semantic understanding. Our proposed model demonstrated superior performance with a mean absolute error (MAE) of 1.04 years and a root mean square error (RMSE) of 1.39 years compared to the other models.
arXiv Detail & Related papers (2024-04-20T02:12:59Z)
Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology [17.214664001970526]
We conduct a systematic study on scaling clinical trial matching using large language models (LLMs) Our study is grounded in a clinical trial matching system currently in test deployment at a large U.S. health network.
arXiv Detail & Related papers (2023-08-04T07:51:15Z)
CliniDigest: A Case Study in Large Language Model Based Large-Scale Summarization of Clinical Trial Descriptions [58.720142291102135]
In 2022, there were on average more than 100 clinical trials submitted to ClinicalTrials.gov every day. CliniDigest is, to our knowledge, the first tool able to provide real-time, truthful, and comprehensive summaries of clinical trials. For each field, CliniDigest generates summaries of $mu=153, igma=69 $ words, each of which utilizes $mu=54%, sigma=30% $ of the sources.
arXiv Detail & Related papers (2023-07-26T21:49:14Z)
AutoTrial: Prompting Language Models for Clinical Trial Design [53.630479619856516]
We present a method named AutoTrial to aid the design of clinical eligibility criteria using language models. Experiments on over 70K clinical trials verify that AutoTrial generates high-quality criteria texts.
arXiv Detail & Related papers (2023-05-19T01:04:16Z)
SPOT: Sequential Predictive Modeling of Clinical Trial Outcome with Meta-Learning [67.8195828626489]
Clinical trials are essential to drug development but time-consuming, costly, and prone to failure. We propose Sequential Predictive mOdeling of clinical Trial outcome (SPOT) that first identifies trial topics to cluster the multi-sourced trial data into relevant trial topics. With the consideration of each trial sequence as a task, it uses a meta-learning strategy to achieve a point where the model can rapidly adapt to new tasks with minimal updates.
arXiv Detail & Related papers (2023-04-07T23:04:27Z)
Clinical trial site matching with improved diversity using fair policy learning [56.01170456417214]
We learn a model that maps a clinical trial description to a ranked list of potential trial sites. Unlike existing fairness frameworks, the group membership of each trial site is non-binary. We propose fairness criteria based on demographic parity to address such a multi-group membership scenario.
arXiv Detail & Related papers (2022-04-13T16:35:28Z)
Predicting Clinical Trial Results by Implicit Evidence Integration [40.80948875051806]
We introduce a novel Clinical Trial Result Prediction (CTRP) task. In the CTRP framework, a model takes a PICO-formatted clinical trial proposal with its background as input and predicts the result. We exploit large-scale unstructured sentences from medical literature that implicitly contain PICOs and results as evidence.
arXiv Detail & Related papers (2020-10-12T12:25:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.