Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
- URL: http://arxiv.org/abs/2105.07148v1
- Date: Sat, 15 May 2021 06:13:39 GMT
- Title: Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter
- Authors: Wei Liu, Xiyan Fu, Yue Zhang and Wenming Xiao
- Abstract summary: existing methods solely fuse lexicon features via a shallow and random sequence layer and do not integrate them into the bottom layers of BERT.
In this paper, we propose Lexicon Enhanced BERT (LEBERT) for Chinese sequence labelling.
Compared with the existing methods, our model achieves lexicon deep lexicon knowledge fusion at the lower layers of BERT.
- Score: 15.336753753889035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lexicon information and pre-trained models, such as BERT, have been combined
to explore Chinese sequence labelling tasks due to their respective strengths.
However, existing methods solely fuse lexicon features via a shallow and random
initialized sequence layer and do not integrate them into the bottom layers of
BERT. In this paper, we propose Lexicon Enhanced BERT (LEBERT) for Chinese
sequence labelling, which integrates external lexicon knowledge into BERT
layers directly by a Lexicon Adapter layer. Compared with the existing methods,
our model facilitates deep lexicon knowledge fusion at the lower layers of
BERT. Experiments on ten Chinese datasets of three tasks including Named Entity
Recognition, Word Segmentation, and Part-of-Speech tagging, show that LEBERT
achieves the state-of-the-art results.
Related papers
- Make BERT-based Chinese Spelling Check Model Enhanced by Layerwise
Attention and Gaussian Mixture Model [33.446533426654995]
We design a heterogeneous knowledge-infused framework to strengthen BERT-based CSC models.
We propose a novel form of n-gram-based layerwise self-attention to generate a multilayer representation.
Experimental results show that our proposed framework yields a stable performance boost over four strong baseline models.
arXiv Detail & Related papers (2023-12-27T16:11:07Z) - BEST: BERT Pre-Training for Sign Language Recognition with Coupling
Tokenization [135.73436686653315]
We are dedicated to leveraging the BERT pre-training success and modeling the domain-specific statistics to fertilize the sign language recognition( SLR) model.
Considering the dominance of hand and body in sign language expression, we organize them as pose triplet units and feed them into the Transformer backbone.
Pre-training is performed via reconstructing the masked triplet unit from the corrupted input sequence.
It adaptively extracts the discrete pseudo label from the pose triplet unit, which represents the semantic gesture/body state.
arXiv Detail & Related papers (2023-02-10T06:23:44Z) - Unsupervised Boundary-Aware Language Model Pretraining for Chinese
Sequence Labeling [25.58155857967128]
Boundary information is critical for various Chinese language processing tasks, such as word segmentation, part-of-speech tagging, and named entity recognition.
We propose an architecture to encode the information directly into pre-trained language models, resulting in Boundary-Aware BERT (BABERT)
Experimental results on ten benchmarks of Chinese sequence labeling demonstrate that BABERT can provide consistent improvements on all datasets.
arXiv Detail & Related papers (2022-10-27T07:38:50Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - MarkBERT: Marking Word Boundaries Improves Chinese BERT [67.53732128091747]
MarkBERT keeps the vocabulary being Chinese characters and inserts boundary markers between contiguous words.
Compared to previous word-based BERT models, MarkBERT achieves better accuracy on text classification, keyword recognition, and semantic similarity tasks.
arXiv Detail & Related papers (2022-03-12T08:43:06Z) - DyLex: Incorporating Dynamic Lexicons into BERT for Sequence Labeling [49.3379730319246]
We propose DyLex, a plug-in lexicon incorporation approach for BERT based sequence labeling tasks.
We adopt word-agnostic tag embeddings to avoid re-training the representation while updating the lexicon.
Finally, we introduce a col-wise attention based knowledge fusion mechanism to guarantee the pluggability of the proposed framework.
arXiv Detail & Related papers (2021-09-18T03:15:49Z) - Lex-BERT: Enhancing BERT based NER with lexicons [1.6884834576352221]
We represent Lex-BERT, which incorporates the lexicon information into Chinese BERT for named entity recognition tasks.
Our model does not introduce any new parameters and are more efficient than FLAT.
arXiv Detail & Related papers (2021-01-02T07:43:21Z) - It's not Greek to mBERT: Inducing Word-Level Translations from
Multilingual BERT [54.84185432755821]
multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages.
We study the word-level translation information embedded in mBERT and present two simple methods that expose remarkable translation capabilities with no fine-tuning.
arXiv Detail & Related papers (2020-10-16T09:49:32Z) - SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word
Models [43.18970770343777]
A contextualized word representation, called BERT, achieves the state-of-the-art performance in quite a few NLP tasks.
Yet, it is an open problem to generate a high quality sentence representation from BERT-based word models.
We propose a new sentence embedding method by dissecting BERT-based word models through geometric analysis of the space spanned by the word representation.
arXiv Detail & Related papers (2020-02-16T19:02:52Z) - BERT's output layer recognizes all hidden layers? Some Intriguing
Phenomena and a simple way to boost BERT [53.63288887672302]
Bidirectional Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks.
We find that surprisingly the output layer of BERT can reconstruct the input sentence by directly taking each layer of BERT as input.
We propose a quite simple method to boost the performance of BERT.
arXiv Detail & Related papers (2020-01-25T13:35:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.