Roof-BERT: Divide Understanding Labour and Join in Work
- URL: http://arxiv.org/abs/2112.06736v1
- Date: Mon, 13 Dec 2021 15:40:54 GMT
- Title: Roof-BERT: Divide Understanding Labour and Join in Work
- Authors: Wei-Lin Liao, Wei-Yun Ma
- Abstract summary: Roof-BERT is a model with two underlying BERTs and a fusion layer on them.
One of the underlying BERTs encodes the knowledge resources and the other one encodes the original input sentences.
Experiment results on QA task reveal the effectiveness of the proposed model.
- Score: 7.523253052992842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work on enhancing BERT-based language representation models with
knowledge graphs (KGs) has promising results on multiple NLP tasks.
State-of-the-art approaches typically integrate the original input sentences
with triples in KGs, and feed the combined representation into a BERT model.
However, as the sequence length of a BERT model is limited, the framework can
not contain too much knowledge besides the original input sentences and is thus
forced to discard some knowledge. The problem is especially severe for those
downstream tasks that input is a long paragraph or even a document, such as QA
or reading comprehension tasks. To address the problem, we propose Roof-BERT, a
model with two underlying BERTs and a fusion layer on them. One of the
underlying BERTs encodes the knowledge resources and the other one encodes the
original input sentences, and the fusion layer like a roof integrates both
BERTs' encodings. Experiment results on QA task reveal the effectiveness of the
proposed model.
Related papers
- Make BERT-based Chinese Spelling Check Model Enhanced by Layerwise
Attention and Gaussian Mixture Model [33.446533426654995]
We design a heterogeneous knowledge-infused framework to strengthen BERT-based CSC models.
We propose a novel form of n-gram-based layerwise self-attention to generate a multilayer representation.
Experimental results show that our proposed framework yields a stable performance boost over four strong baseline models.
arXiv Detail & Related papers (2023-12-27T16:11:07Z) - Mixed-Distil-BERT: Code-mixed Language Modeling for Bangla, English, and Hindi [0.0]
We introduce Tri-Distil-BERT, a multilingual model pre-trained on Bangla, English, and Hindi, and Mixed-Distil-BERT, a model fine-tuned on code-mixed data.
Our two-tiered pre-training approach offers efficient alternatives for multilingual and code-mixed language understanding.
arXiv Detail & Related papers (2023-09-19T02:59:41Z) - Deep Bidirectional Language-Knowledge Graph Pretraining [159.9645181522436]
DRAGON is a self-supervised approach to pretraining a deeply joint language-knowledge foundation model from text and KG at scale.
Our model takes pairs of text segments and relevant KG subgraphs as input and bidirectionally fuses information from both modalities.
arXiv Detail & Related papers (2022-10-17T18:02:52Z) - Incorporating BERT into Parallel Sequence Decoding with Adapters [82.65608966202396]
We propose to take two different BERT models as the encoder and decoder respectively, and fine-tune them by introducing simple and lightweight adapter modules.
We obtain a flexible and efficient model which is able to jointly leverage the information contained in the source-side and target-side BERT models.
Our framework is based on a parallel sequence decoding algorithm named Mask-Predict considering the bi-directional and conditional independent nature of BERT.
arXiv Detail & Related papers (2020-10-13T03:25:15Z) - DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications.
We propose a simple but effective method, DeeBERT, to accelerate BERT inference.
Experiments show that DeeBERT is able to save up to 40% inference time with minimal degradation in model quality.
arXiv Detail & Related papers (2020-04-27T17:58:05Z) - DC-BERT: Decoupling Question and Document for Efficient Contextual
Encoding [90.85913515409275]
Recent studies on open-domain question answering have achieved prominent performance improvement using pre-trained language models such as BERT.
We propose DC-BERT, a contextual encoding framework that has dual BERT models: an online BERT which encodes the question only once, and an offline BERT which pre-encodes all the documents and caches their encodings.
On SQuAD Open and Natural Questions Open datasets, DC-BERT achieves 10x speedup on document retrieval, while retaining most (about 98%) of the QA performance.
arXiv Detail & Related papers (2020-02-28T08:18:37Z) - Incorporating BERT into Neural Machine Translation [251.54280200353674]
We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence.
We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets.
arXiv Detail & Related papers (2020-02-17T08:13:36Z) - SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word
Models [43.18970770343777]
A contextualized word representation, called BERT, achieves the state-of-the-art performance in quite a few NLP tasks.
Yet, it is an open problem to generate a high quality sentence representation from BERT-based word models.
We propose a new sentence embedding method by dissecting BERT-based word models through geometric analysis of the space spanned by the word representation.
arXiv Detail & Related papers (2020-02-16T19:02:52Z) - TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for
Efficient Retrieval [11.923682816611716]
We present TwinBERT model for effective and efficient retrieval.
It has twin-structured BERT-like encoders to represent query and document respectively.
It allows document embeddings to be pre-computed offline and cached in memory.
arXiv Detail & Related papers (2020-02-14T22:44:36Z) - BERT's output layer recognizes all hidden layers? Some Intriguing
Phenomena and a simple way to boost BERT [53.63288887672302]
Bidirectional Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks.
We find that surprisingly the output layer of BERT can reconstruct the input sentence by directly taking each layer of BERT as input.
We propose a quite simple method to boost the performance of BERT.
arXiv Detail & Related papers (2020-01-25T13:35:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.