DyLex: Incorporating Dynamic Lexicons into BERT for Sequence Labeling
- URL: http://arxiv.org/abs/2109.08818v2
- Date: Wed, 22 Sep 2021 04:19:04 GMT
- Title: DyLex: Incorporating Dynamic Lexicons into BERT for Sequence Labeling
- Authors: Baojun Wang, Zhao Zhang, Kun Xu, Guang-Yuan Hao, Yuyang Zhang, Lifeng
Shang, Linlin Li, Xiao Chen, Xin Jiang and Qun Liu
- Abstract summary: We propose DyLex, a plug-in lexicon incorporation approach for BERT based sequence labeling tasks.
We adopt word-agnostic tag embeddings to avoid re-training the representation while updating the lexicon.
Finally, we introduce a col-wise attention based knowledge fusion mechanism to guarantee the pluggability of the proposed framework.
- Score: 49.3379730319246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incorporating lexical knowledge into deep learning models has been proved to
be very effective for sequence labeling tasks. However, previous works commonly
have difficulty dealing with large-scale dynamic lexicons which often cause
excessive matching noise and problems of frequent updates. In this paper, we
propose DyLex, a plug-in lexicon incorporation approach for BERT based sequence
labeling tasks. Instead of leveraging embeddings of words in the lexicon as in
conventional methods, we adopt word-agnostic tag embeddings to avoid
re-training the representation while updating the lexicon. Moreover, we employ
an effective supervised lexical knowledge denoising method to smooth out
matching noise. Finally, we introduce a col-wise attention based knowledge
fusion mechanism to guarantee the pluggability of the proposed framework.
Experiments on ten datasets of three tasks show that the proposed framework
achieves new SOTA, even with very large scale lexicons.
Related papers
- Incorporating Lexical and Syntactic Knowledge for Unsupervised Cross-Lingual Transfer [4.944761231728674]
We present a novel framework called "Lexicon-Syntax Enhanced Multilingual BERT"
We use Multilingual BERT as the base model and employ two techniques to enhance its learning capabilities.
Our experimental results demonstrate this framework can consistently outperform all baselines of zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2024-04-25T14:10:52Z) - ERASE: Error-Resilient Representation Learning on Graphs for Label Noise
Tolerance [53.73316938815873]
We propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE) to learn representations with error tolerance.
ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience.
Our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability.
arXiv Detail & Related papers (2023-12-13T17:59:07Z) - Enhancing Contrastive Learning with Noise-Guided Attack: Towards
Continual Relation Extraction in the Wild [57.468184469589744]
We develop a noise-resistant contrastive framework named as textbfNoise-guided textbfattack in textbfContrative textbfLearning(NaCL)
Compared to direct noise discarding or inaccessible noise relabeling, we present modifying the feature space to match the given noisy labels via attacking.
arXiv Detail & Related papers (2023-05-11T18:48:18Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Generative Prompt Tuning for Relation Classification [21.027631157115135]
We propose a novel generative prompt tuning method to reformulate relation classification as an infilling problem.
In addition, we design entity-guided decoding and discriminative relation scoring to generate and align relations effectively and efficiently during inference.
arXiv Detail & Related papers (2022-10-22T12:40:23Z) - Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter [15.336753753889035]
existing methods solely fuse lexicon features via a shallow and random sequence layer and do not integrate them into the bottom layers of BERT.
In this paper, we propose Lexicon Enhanced BERT (LEBERT) for Chinese sequence labelling.
Compared with the existing methods, our model achieves lexicon deep lexicon knowledge fusion at the lower layers of BERT.
arXiv Detail & Related papers (2021-05-15T06:13:39Z) - Learn from Syntax: Improving Pair-wise Aspect and Opinion Terms
Extractionwith Rich Syntactic Knowledge [17.100366742363803]
We propose to enhance the pair-wise aspect and opinion terms extraction (PAOTE) task by incorporating rich syntactic knowledge.
We first build a syntax fusion encoder for encoding syntactic features, including a label-aware graph convolutional network (LAGCN) for modeling the dependency edges and labels.
During pairing, we then adopt Biaffine and Triaffine scoring for high-order aspect-opinion term pairing, in the meantime re-harnessing the syntax-enriched representations in LAGCN for syntactic-aware scoring.
arXiv Detail & Related papers (2021-05-06T08:45:40Z) - KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization
for Relation Extraction [111.74812895391672]
We propose a Knowledge-aware Prompt-tuning approach with synergistic optimization (KnowPrompt)
We inject latent knowledge contained in relation labels into prompt construction with learnable virtual type words and answer words.
arXiv Detail & Related papers (2021-04-15T17:57:43Z) - Enhanced word embeddings using multi-semantic representation through
lexical chains [1.8199326045904998]
We propose two novel algorithms, called Flexible Lexical Chain II and Fixed Lexical Chain II.
These algorithms combine the semantic relations derived from lexical chains, prior knowledge from lexical databases, and the robustness of the distributional hypothesis in word embeddings as building blocks forming a single system.
Our results show the integration between lexical chains and word embeddings representations sustain state-of-the-art results, even against more complex systems.
arXiv Detail & Related papers (2021-01-22T09:43:33Z) - Learning Not to Learn in the Presence of Noisy Labels [104.7655376309784]
We show that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption.
We show that training with this loss function encourages the model to "abstain" from learning on the data points with noisy labels.
arXiv Detail & Related papers (2020-02-16T09:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.