Contrastive Learning and Mixture of Experts Enables Precise Vector Embeddings
- URL: http://arxiv.org/abs/2401.15713v2
- Date: Thu, 30 May 2024 19:55:58 GMT
- Title: Contrastive Learning and Mixture of Experts Enables Precise Vector Embeddings
- Authors: Logan Hallee, Rohan Kapur, Arjun Patel, Jason P. Gleghorn, Bohdan Khomtchouk,
- Abstract summary: This paper improves upon the vectors embeddings of scientific literature by assembling niche datasets using co-citations as a similarity metric.
We apply a novel Mixture of Experts (MoE) extension pipeline to pretrained BERT models, where every multi-layer perceptron section is enlarged and copied into multiple distinct experts.
Our MoE variants perform well over $N$ scientific domains with $N$ dedicated experts, whereas standard BERT models excel in only one domain.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advancement of transformer neural networks has significantly elevated the capabilities of sentence similarity models, but they struggle with highly discriminative tasks and produce sub-optimal representations of important documents like scientific literature. With the increased reliance on retrieval augmentation and search, representing diverse documents as concise and descriptive vectors is crucial. This paper improves upon the vectors embeddings of scientific literature by assembling niche datasets using co-citations as a similarity metric, focusing on biomedical domains. We apply a novel Mixture of Experts (MoE) extension pipeline to pretrained BERT models, where every multi-layer perceptron section is enlarged and copied into multiple distinct experts. Our MoE variants perform well over $N$ scientific domains with $N$ dedicated experts, whereas standard BERT models excel in only one domain. Notably, extending just a single transformer block to MoE captures 85% of the benefit seen from full MoE extension at every layer. This holds promise for versatile and efficient One-Size-Fits-All transformer networks for numerically representing diverse inputs. Our methodology marks significant advancements in representing scientific text and holds promise for enhancing vector database search and compilation.
Related papers
- Utilizing BERT for Information Retrieval: Survey, Applications,
Resources, and Challenges [4.588192657854766]
This survey focuses on approaches that apply pretrained transformer encoders like BERT to information retrieval (IR)
We group them into six high-level categories: (i) handling long documents, (ii) integrating semantic information, (iii) balancing effectiveness and efficiency, (iv) predicting the weights of terms, (v) query expansion, and (vi) document expansion.
We find that for specific tasks, finely tuned BERT encoders still outperform, and at a lower deployment cost.
arXiv Detail & Related papers (2024-02-18T23:22:40Z) - A Comprehensive Survey on Applications of Transformers for Deep Learning
Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data.
transformer models excel in handling long dependencies between input sequence elements and enable parallel processing.
Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z) - Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge
Graph Completion [112.27103169303184]
Multimodal Knowledge Graphs (MKGs) organize visual-text factual knowledge.
MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER.
arXiv Detail & Related papers (2022-05-04T23:40:04Z) - Hierarchical Transformer Model for Scientific Named Entity Recognition [0.20646127669654832]
We present a simple and effective approach for Named Entity Recognition.
The main idea of our approach is to encode the input subword sequence with a pre-trained transformer such as BERT.
We evaluate our approach on three benchmark datasets for scientific NER.
arXiv Detail & Related papers (2022-03-28T12:59:06Z) - META: Mimicking Embedding via oThers' Aggregation for Generalizable
Person Re-identification [68.39849081353704]
Domain generalizable (DG) person re-identification (ReID) aims to test across unseen domains without access to the target domain data at training time.
This paper presents a new approach called Mimicking Embedding via oThers' Aggregation (META) for DG ReID.
arXiv Detail & Related papers (2021-12-16T08:06:50Z) - Transferring BERT-like Transformers' Knowledge for Authorship
Verification [8.443350618722562]
We study the effectiveness of several BERT-like transformers for the task of authorship verification.
We provide new splits for PAN-2020, where training and test data are sampled from disjoint topics or authors.
We show that those splits can enhance the models' capability to transfer knowledge over a new, significantly different dataset.
arXiv Detail & Related papers (2021-12-09T18:57:29Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers.
The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z) - Transformer Based Multi-Source Domain Adaptation [53.24606510691877]
In practical machine learning settings, the data on which a model must make predictions often come from a different distribution than the data it was trained on.
Here, we investigate the problem of unsupervised multi-source domain adaptation, where a model is trained on labelled data from multiple source domains and must make predictions on a domain for which no labelled data has been seen.
We show that the predictions of large pretrained transformer based domain experts are highly homogenous, making it challenging to learn effective functions for mixing their predictions.
arXiv Detail & Related papers (2020-09-16T16:56:23Z) - MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition
using Deep Bidirectional Transformers [1.7403133838762446]
We consider the training of a slot tagger using multiple data sets covering different slot types as a multi-task learning problem.
The experimental results on the biomedical domain have shown that the proposed approach outperforms the previous state-of-the-art systems for slot tagging.
arXiv Detail & Related papers (2020-01-24T07:16:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.