BERT-GT: Cross-sentence n-ary relation extraction with BERT and Graph
Transformer
- URL: http://arxiv.org/abs/2101.04158v1
- Date: Mon, 11 Jan 2021 19:34:55 GMT
- Title: BERT-GT: Cross-sentence n-ary relation extraction with BERT and Graph
Transformer
- Authors: Po-Ting Lai and Zhiyong Lu
- Abstract summary: We propose a novel architecture that combines Bidirectional Representations from Transformers with Graph Transformer (BERT-GT)
Unlike the original Transformer architecture, which utilizes the whole sentence(s) to calculate the attention of the current token, the neighbor-attention mechanism in our method calculates its attention utilizing only its neighbor tokens.
Our results show improvements of 5.44% and 3.89% in accuracy and F1-measure over the state-of-the-art on n-proteinary and chemical-proteinary datasets.
- Score: 7.262905275276971
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A biomedical relation statement is commonly expressed in multiple sentences
and consists of many concepts, including gene, disease, chemical, and mutation.
To automatically extract information from biomedical literature, existing
biomedical text-mining approaches typically formulate the problem as a
cross-sentence n-ary relation-extraction task that detects relations among n
entities across multiple sentences, and use either a graph neural network (GNN)
with long short-term memory (LSTM) or an attention mechanism. Recently,
Transformer has been shown to outperform LSTM on many natural language
processing (NLP) tasks. In this work, we propose a novel architecture that
combines Bidirectional Encoder Representations from Transformers with Graph
Transformer (BERT-GT), through integrating a neighbor-attention mechanism into
the BERT architecture. Unlike the original Transformer architecture, which
utilizes the whole sentence(s) to calculate the attention of the current token,
the neighbor-attention mechanism in our method calculates its attention
utilizing only its neighbor tokens. Thus, each token can pay attention to its
neighbor information with little noise. We show that this is critically
important when the text is very long, as in cross-sentence or abstract-level
relation-extraction tasks. Our benchmarking results show improvements of 5.44%
and 3.89% in accuracy and F1-measure over the state-of-the-art on n-ary and
chemical-protein relation datasets, suggesting BERT-GT is a robust approach
that is applicable to other biomedical relation extraction tasks or datasets.
Related papers
- Benchmark on Drug Target Interaction Modeling from a Structure Perspective [48.60648369785105]
Drug-target interaction prediction is crucial to drug discovery and design.
Recent methods, such as those based on graph neural networks (GNNs) and Transformers, demonstrate exceptional performance across various datasets.
We conduct a comprehensive survey and benchmark for drug-target interaction modeling from a structure perspective, via integrating tens of explicit (i.e., GNN-based) and implicit (i.e., Transformer-based) structure learning algorithms.
arXiv Detail & Related papers (2024-07-04T16:56:59Z) - Computation and Parameter Efficient Multi-Modal Fusion Transformer for
Cued Speech Recognition [48.84506301960988]
Cued Speech (CS) is a pure visual coding method used by hearing-impaired people.
automatic CS recognition (ACSR) seeks to transcribe visual cues of speech into text.
arXiv Detail & Related papers (2024-01-31T05:20:29Z) - Correlated Attention in Transformers for Multivariate Time Series [22.542109523780333]
We propose a novel correlated attention mechanism, which efficiently captures feature-wise dependencies, and can be seamlessly integrated within the encoder blocks of existing Transformers.
In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at the sub-series level.
This architecture facilitates automated discovery and representation learning of not only instantaneous but also lagged cross-correlations, while inherently capturing time series auto-correlation.
arXiv Detail & Related papers (2023-11-20T17:35:44Z) - Multimodal Optimal Transport-based Co-Attention Transformer with Global
Structure Consistency for Survival Prediction [5.445390550440809]
Survival prediction is a complicated ordinal regression task that aims to predict the ranking risk of death.
Due to the large size of pathological images, it is difficult to effectively represent the gigapixel whole slide images (WSIs)
Interactions within tumor microenvironment (TME) in histology are essential for survival analysis.
arXiv Detail & Related papers (2023-06-14T08:01:24Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - Simple and Efficient Heterogeneous Graph Neural Network [55.56564522532328]
Heterogeneous graph neural networks (HGNNs) have powerful capability to embed rich structural and semantic information of a heterogeneous graph into node representations.
Existing HGNNs inherit many mechanisms from graph neural networks (GNNs) over homogeneous graphs, especially the attention mechanism and the multi-layer structure.
This paper conducts an in-depth and detailed study of these mechanisms and proposes Simple and Efficient Heterogeneous Graph Neural Network (SeHGNN)
arXiv Detail & Related papers (2022-07-06T10:01:46Z) - Pre-training Co-evolutionary Protein Representation via A Pairwise
Masked Language Model [93.9943278892735]
Key problem in protein sequence representation learning is to capture the co-evolutionary information reflected by the inter-residue co-variation in the sequences.
We propose a novel method to capture this information directly by pre-training via a dedicated language model, i.e., Pairwise Masked Language Model (PMLM)
Our result shows that the proposed method can effectively capture the interresidue correlations and improves the performance of contact prediction by up to 9% compared to the baseline.
arXiv Detail & Related papers (2021-10-29T04:01:32Z) - Permutation invariant graph-to-sequence model for template-free
retrosynthesis and reaction prediction [2.5655440962401617]
We describe a novel Graph2SMILES model that combines the power of Transformer models for text generation with the permutation invariance of molecular graph encoders.
As an end-to-end architecture, Graph2SMILES can be used as a drop-in replacement for the Transformer in any task involving molecule(s)-to-molecule(s) transformations.
arXiv Detail & Related papers (2021-10-19T01:23:15Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - A hybrid deep-learning approach for complex biochemical named entity
recognition [9.657827522380712]
Named entity recognition (NER) of chemicals and drugs is a critical domain of information extraction in biochemical research.
Here, we propose a hybrid deep learning approach to improve the recognition accuracy of NER.
arXiv Detail & Related papers (2020-12-20T01:30:07Z) - Hybrid Attention-Based Transformer Block Model for Distant Supervision
Relation Extraction [20.644215991166902]
We propose a new framework using hybrid attention-based Transformer block with multi-instance learning to perform the DSRE task.
The proposed approach can outperform the state-of-the-art algorithms on the evaluation dataset.
arXiv Detail & Related papers (2020-03-10T13:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.