Improving Biomedical Information Retrieval with Neural Retrievers
- URL: http://arxiv.org/abs/2201.07745v1
- Date: Wed, 19 Jan 2022 17:36:54 GMT
- Title: Improving Biomedical Information Retrieval with Neural Retrievers
- Authors: Man Luo, Arindam Mitra, Tejas Gokhale, Chitta Baral
- Abstract summary: We propose a template-based question generation method that can be leveraged to train neural retriever models.
Second, we develop two novel pre-training tasks that are closely aligned to the downstream task of information retrieval.
Third, we introduce the Poly-DPR'' model which encodes each context into multiple context vectors.
- Score: 30.778569849542837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Information retrieval (IR) is essential in search engines and dialogue
systems as well as natural language processing tasks such as open-domain
question answering. IR serve an important function in the biomedical domain,
where content and sources of scientific knowledge may evolve rapidly. Although
neural retrievers have surpassed traditional IR approaches such as TF-IDF and
BM25 in standard open-domain question answering tasks, they are still found
lacking in the biomedical domain. In this paper, we seek to improve information
retrieval (IR) using neural retrievers (NR) in the biomedical domain, and
achieve this goal using a three-pronged approach. First, to tackle the relative
lack of data in the biomedical domain, we propose a template-based question
generation method that can be leveraged to train neural retriever models.
Second, we develop two novel pre-training tasks that are closely aligned to the
downstream task of information retrieval. Third, we introduce the ``Poly-DPR''
model which encodes each context into multiple context vectors. Extensive
experiments and analysis on the BioASQ challenge suggest that our proposed
method leads to large gains over existing neural approaches and beats BM25 in
the small-corpus setting. We show that BM25 and our method can complement each
other, and a simple hybrid model leads to further gains in the large corpus
setting.
Related papers
- BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine [19.861178160437827]
Large Language Models (LLMs) have swiftly emerged as vital resources for different applications in the biomedical and healthcare domains.
textscBiomedRAG attains superior performance across 5 biomedical NLP tasks.
textscBiomedRAG outperforms other triple extraction systems with micro-F1 scores of 81.42 and 88.83 on GIT and ChemProt corpora, respectively.
arXiv Detail & Related papers (2024-05-01T12:01:39Z) - Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge [2.2814097119704058]
Large language models (LLMs) are transforming the way information is retrieved with vast amounts of knowledge being summarized and presented.
LLMs are prone to highlight the most frequently seen pieces of information from the training set and to neglect the rare ones.
We introduce a novel information-retrieval method that leverages a knowledge graph to downsample these clusters and mitigate the information overload problem.
arXiv Detail & Related papers (2024-02-19T18:31:11Z) - Progress and Opportunities of Foundation Models in Bioinformatics [77.74411726471439]
Foundations models (FMs) have ushered in a new era in computational biology, especially in the realm of deep learning.
Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research needs.
Review analyses challenges and limitations faced by FMs in biology, such as data noise, model explainability, and potential biases.
arXiv Detail & Related papers (2024-02-06T02:29:17Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - CorpusBrain: Pre-train a Generative Retrieval Model for
Knowledge-Intensive Language Tasks [62.22920673080208]
Single-step generative model can dramatically simplify the search process and be optimized in end-to-end manner.
We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index.
arXiv Detail & Related papers (2022-08-16T10:22:49Z) - Neural Retriever and Go Beyond: A Thesis Proposal [1.082365064737981]
Information Retriever (IR) aims to find the relevant documents to a given query at large scale.
Recent neural-based algorithms (termed as neural retrievers) have gained more attention which can mitigate the limitations of traditional methods.
arXiv Detail & Related papers (2022-05-31T17:59:30Z) - Recognising Biomedical Names: Challenges and Solutions [9.51284672475743]
We propose a transition-based NER model which can recognise discontinuous mentions.
We also develop a cost-effective approach that nominates the suitable pre-training data.
Our contributions have obvious practical implications, especially when new biomedical applications are needed.
arXiv Detail & Related papers (2021-06-23T08:20:13Z) - Fader Networks for domain adaptation on fMRI: ABIDE-II study [68.5481471934606]
We use 3D convolutional autoencoders to build the domain irrelevant latent space image representation and demonstrate this method to outperform existing approaches on ABIDE data.
arXiv Detail & Related papers (2020-10-14T16:50:50Z) - Multi-Perspective Semantic Information Retrieval in the Biomedical
Domain [0.0]
Information Retrieval (IR) is the task of obtaining pieces of data (such as documents) that are relevant to a particular query or need.
Modern neural approaches pose certain advantages compared to their classical counterparts.
This work presents contributions to several aspects of the Biomedical Semantic Information Retrieval domain.
arXiv Detail & Related papers (2020-07-17T21:05:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.