Related papers: BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks

BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks

URL: http://arxiv.org/abs/2109.02237v1
Date: Mon, 6 Sep 2021 04:25:47 GMT
Title: BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks
Authors: Tuan Lai, Heng Ji, and ChengXiang Zhai
Abstract summary: We propose an efficient convolutional neural network with residual connections for biomedical entity linking. Our model achieves comparable or even better linking accuracy than the state-of-the-art BERT-based models.
Score: 41.528797439272175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Biomedical entity linking is the task of linking entity mentions in a biomedical document to referent entities in a knowledge base. Recently, many BERT-based models have been introduced for the task. While these models have achieved competitive results on many datasets, they are computationally expensive and contain about 110M parameters. Little is known about the factors contributing to their impressive performance and whether the over-parameterization is needed. In this work, we shed some light on the inner working mechanisms of these large BERT-based models. Through a set of probing experiments, we have found that the entity linking performance only changes slightly when the input word order is shuffled or when the attention scope is limited to a fixed window size. From these observations, we propose an efficient convolutional neural network with residual connections for biomedical entity linking. Because of the sparse connectivity and weight sharing properties, our model has a small number of parameters and is highly efficient. On five public datasets, our model achieves comparable or even better linking accuracy than the state-of-the-art BERT-based models while having about 60 times fewer parameters.

Related papers

Why Do More Experts Fail? A Theoretical Analysis of Model Merging [51.18155031364046]
Model merging dramatically reduces storage and computational resources by combining multiple expert models into a single multi-task model.<n>Recent model merging methods have shown promising results, but struggle to maintain performance gains as the number of merged models increases.<n>We show that the limited effective parameter space imposes a strict constraint on the number of models that can be successfully merged.
arXiv Detail & Related papers (2025-05-27T14:10:46Z)
RelGNN: Composite Message Passing for Relational Deep Learning [56.48834369525997]
We introduce RelGNN, a novel GNN framework specifically designed to capture the unique characteristics of relational databases. At the core of our approach is the introduction of atomic routes, which are sequences of nodes forming high-order tripartite structures. RelGNN consistently achieves state-of-the-art accuracy with up to 25% improvement.
arXiv Detail & Related papers (2025-02-10T18:58:40Z)
Does Self-Attention Need Separate Weights in Transformers? [0.884834042985207]
This work introduces a shared weight self-attention-based BERT model that only learns one weight matrix for (Key, Value, and Query) representations. Experimental results indicate that our shared self-attention method achieves a parameter size reduction of 66.53% in the attention block. In the GLUE dataset, the shared weight self-attention-based BERT model demonstrates accuracy improvements of 0.38%, 5.81%, and 1.06% over the standard, symmetric, and pairwise attention-based BERT models.
arXiv Detail & Related papers (2024-11-30T04:46:20Z)
BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs [3.780924717521521]
We propose a modular framework for learning embeddings in knowledge graphs. It allows encoding attribute data of different modalities while also supporting entities with missing attributes. We train models using a biomedical KG containing approximately 2 million triples.
arXiv Detail & Related papers (2023-06-06T11:49:38Z)
UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features. Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z)
ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking [5.382800665115746]
ReFinED is an efficient end-to-end entity linking model. It performs mention detection, fine-grained entity typing, and entity disambiguation for all mentions within a document in a single forward pass. It surpasses state-of-the-art performance on standard entity linking datasets by an average of 3.7 F1.
arXiv Detail & Related papers (2022-07-08T19:20:42Z)
Automatic Mixed-Precision Quantization Search of BERT [62.65905462141319]
Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks. These models usually contain millions of parameters, which prevents them from practical deployment on resource-constrained devices. We propose an automatic mixed-precision quantization framework designed for BERT that can simultaneously conduct quantization and pruning in a subgroup-wise level.
arXiv Detail & Related papers (2021-12-30T06:32:47Z)
Entity Linking and Discovery via Arborescence-based Supervised Clustering [35.93568319872986]
We present novel training and inference procedures that fully utilize mention-to-mention affinities. We show that this method gracefully extends to entity discovery. We evaluate our approach on the Zero-Shot Entity Linking dataset and MedMentions, the largest publicly available biomedical dataset.
arXiv Detail & Related papers (2021-09-02T23:05:58Z)
Fast and Effective Biomedical Entity Linking Using a Dual Encoder [48.86736921025866]
We propose a BERT-based dual encoder model that resolves multiple mentions in a document in one shot. We show that our proposed model is multiple times faster than existing BERT-based models while being competitive in accuracy for biomedical entity linking.
arXiv Detail & Related papers (2021-03-08T19:32:28Z)
A Lightweight Neural Model for Biomedical Entity Linking [1.8047694351309205]
We propose a lightweight neural method for biomedical entity linking. Our method uses a simple alignment layer with attention mechanisms to capture the variations between mention and entity names. Our model is competitive with previous work on standard evaluation benchmarks.
arXiv Detail & Related papers (2020-12-16T10:34:37Z)
DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets. We propose an efficient and effective data augmentation method called DecAug for HOI detection. Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z)
ConvBERT: Improving BERT with Span-based Dynamic Convolution [144.25748617961082]
BERT heavily relies on the global self-attention block and thus suffers large memory footprint and computation cost. We propose a novel span-based dynamic convolution to replace these self-attention heads to directly model local dependencies. The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning.
arXiv Detail & Related papers (2020-08-06T07:43:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.