AT-BERT: Adversarial Training BERT for Acronym Identification Winning
Solution for SDU@AAAI-21
- URL: http://arxiv.org/abs/2101.03700v2
- Date: Tue, 12 Jan 2021 08:38:45 GMT
- Title: AT-BERT: Adversarial Training BERT for Acronym Identification Winning
Solution for SDU@AAAI-21
- Authors: Danqing Zhu, Wangli Lin, Yang Zhang, Qiwei Zhong, Guanxiong Zeng,
Weilin Wu, Jiayu Tang
- Abstract summary: Acronym identification focuses on finding the acronyms and the phrases that have been abbreviated.
Recent breakthroughs of language models pre-trained on large corpora clearly show that unsupervised pre-training can vastly improve the performance of downstream tasks.
We present an Adversarial Training BERT method named AT-BERT, our winning solution to acronym identification task for Scientific Document Understanding (SDU) Challenge of AAAI 2021.
- Score: 5.478126869836199
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Acronym identification focuses on finding the acronyms and the phrases that
have been abbreviated, which is crucial for scientific document understanding
tasks. However, the limited size of manually annotated datasets hinders further
improvement for the problem. Recent breakthroughs of language models
pre-trained on large corpora clearly show that unsupervised pre-training can
vastly improve the performance of downstream tasks. In this paper, we present
an Adversarial Training BERT method named AT-BERT, our winning solution to
acronym identification task for Scientific Document Understanding (SDU)
Challenge of AAAI 2021. Specifically, the pre-trained BERT is adopted to
capture better semantic representation. Then we incorporate the FGM adversarial
training strategy into the fine-tuning of BERT, which makes the model more
robust and generalized. Furthermore, an ensemble mechanism is devised to
involve the representations learned from multiple BERT variants. Assembling all
these components together, the experimental results on the SciAI dataset show
that our proposed approach outperforms all other competitive state-of-the-art
methods.
Related papers
- Test-Time Training on Graphs with Large Language Models (LLMs) [68.375487369596]
Test-Time Training (TTT) has been proposed as a promising approach to train Graph Neural Networks (GNNs)
Inspired by the great annotation ability of Large Language Models (LLMs) on Text-Attributed Graphs (TAGs), we propose to enhance the test-time training on graphs with LLMs as annotators.
A two-stage training strategy is designed to tailor the test-time model with the limited and noisy labels.
arXiv Detail & Related papers (2024-04-21T08:20:02Z) - BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input
Representation [92.75908003533736]
We propose a framework-level robust sequence-to-sequence learning approach, named BLISS, via self-supervised input representation.
We conduct comprehensive experiments to validate the effectiveness of BLISS on various tasks, including machine translation, grammatical error correction, and text summarization.
arXiv Detail & Related papers (2022-04-16T16:19:47Z) - SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks.
We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain.
We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z) - MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal
Emotion Recognition [118.73025093045652]
We propose a pre-training model textbfMEmoBERT for multimodal emotion recognition.
Unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction.
Our proposed MEmoBERT significantly enhances emotion recognition performance.
arXiv Detail & Related papers (2021-10-27T09:57:00Z) - Training ELECTRA Augmented with Multi-word Selection [53.77046731238381]
We present a new text encoder pre-training method that improves ELECTRA based on multi-task learning.
Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets.
arXiv Detail & Related papers (2021-05-31T23:19:00Z) - BERT-based Acronym Disambiguation with Multiple Training Strategies [8.82012912690778]
Acronym disambiguation (AD) task aims to find the correct expansions of an ambiguous ancronym in a given sentence.
We propose a binary classification model incorporating BERT and several training strategies including dynamic negative sample selection.
Experiments on SciAD show the effectiveness of our proposed model and our score ranks 1st in SDU@AAAI-21 shared task 2: Acronym Disambiguation.
arXiv Detail & Related papers (2021-02-25T05:40:21Z) - Using Prior Knowledge to Guide BERT's Attention in Semantic Textual
Matching Tasks [13.922700041632302]
We study the problem of incorporating prior knowledge into a deep Transformer-based model,i.e.,Bidirectional Representations from Transformers (BERT)
We obtain better understanding of what task-specific knowledge BERT needs the most and where it is most needed.
Experiments demonstrate that the proposed knowledge-enhanced BERT is able to consistently improve semantic textual matching performance.
arXiv Detail & Related papers (2021-02-22T12:07:16Z) - Primer AI's Systems for Acronym Identification and Disambiguation [0.0]
We introduce new methods for acronym identification and disambiguation.
Our systems achieve significant performance gains over previously suggested methods.
Both of our systems perform competitively on the SDU@AAAI-21 shared task leaderboard.
arXiv Detail & Related papers (2020-12-14T23:59:05Z) - GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight
Gated Injection Method [29.352569563032056]
We propose a novel method to explicitly inject linguistic knowledge in the form of word embeddings into a pre-trained BERT.
Our performance improvements on multiple semantic similarity datasets when injecting dependency-based and counter-fitted embeddings indicate that such information is beneficial and currently missing from the original model.
arXiv Detail & Related papers (2020-10-23T17:00:26Z) - Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation [84.64004917951547]
Fine-tuning pre-trained language models like BERT has become an effective way in NLP.
In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation.
arXiv Detail & Related papers (2020-02-24T16:17:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.