DSC IIT-ISM at SemEval-2020 Task 6: Boosting BERT with Dependencies for
Definition Extraction
- URL: http://arxiv.org/abs/2009.08180v1
- Date: Thu, 17 Sep 2020 09:48:59 GMT
- Title: DSC IIT-ISM at SemEval-2020 Task 6: Boosting BERT with Dependencies for
Definition Extraction
- Authors: Aadarsh Singh, Priyanshu Kumar and Aman Sinha
- Abstract summary: We explore the performance of Bidirectional Representations from Transformers (BERT) at definition extraction.
We propose a joint model of BERT and Text Level Graph Convolutional Network so as to incorporate dependencies into the model.
- Score: 9.646922337783133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We explore the performance of Bidirectional Encoder Representations from
Transformers (BERT) at definition extraction. We further propose a joint model
of BERT and Text Level Graph Convolutional Network so as to incorporate
dependencies into the model. Our proposed model produces better results than
BERT and achieves comparable results to BERT with fine tuned language model in
DeftEval (Task 6 of SemEval 2020), a shared task of classifying whether a
sentence contains a definition or not (Subtask 1).
Related papers
- Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - Deploying a BERT-based Query-Title Relevance Classifier in a Production
System: a View from the Trenches [3.1219977244201056]
Bidirectional Representations from Transformers (BERT) model has been radically improving the performance of many Natural Language Processing (NLP) tasks.
It is challenging to scale BERT for low-latency and high- throughput industrial use cases due to its enormous size.
We successfully optimize a Query-Title Relevance (QTR) classifier for deployment via a compact model, which we name BERT Bidirectional Long Short-Term Memory (BertBiLSTM)
BertBiLSTM exceeds the off-the-shelf BERT model's performance in terms of accuracy and efficiency for the aforementioned real-world production task
arXiv Detail & Related papers (2021-08-23T14:28:23Z) - Evaluation of BERT and ALBERT Sentence Embedding Performance on
Downstream NLP Tasks [4.955649816620742]
This paper explores on sentence embedding models for BERT and ALBERT.
We take a modified BERT network with siamese and triplet network structures called Sentence-BERT (SBERT) and replace BERT with ALBERT to create Sentence-ALBERT (SALBERT)
arXiv Detail & Related papers (2021-01-26T09:14:06Z) - BinaryBERT: Pushing the Limit of BERT Quantization [74.65543496761553]
We propose BinaryBERT, which pushes BERT quantization to the limit with weight binarization.
We find that a binary BERT is hard to be trained directly than a ternary counterpart due to its complex and irregular loss landscapes.
Empirical results show that BinaryBERT has negligible performance drop compared to the full-precision BERT-base.
arXiv Detail & Related papers (2020-12-31T16:34:54Z) - Dartmouth CS at WNUT-2020 Task 2: Informative COVID-19 Tweet
Classification Using BERT [2.1574781022415364]
We describe the systems developed for the WNUT-2020 shared task 2, identification of informative COVID-19 English Tweets.
BERT is a highly performant model for Natural Language Processing tasks.
We increased BERT's performance in this classification task by fine-tuning BERT and concatenating its embeddings with Tweet-specific features.
arXiv Detail & Related papers (2020-12-07T07:55:31Z) - UPB at SemEval-2020 Task 6: Pretrained Language Models for Definition
Extraction [0.17188280334580194]
This work presents our contribution in the context of the 6th task of SemEval-2020: Extracting Definitions from Free Text in Textbooks.
We use various pretrained language models to solve each of the three subtasks of the competition.
Our best performing model evaluated on the DeftEval dataset obtains the 32nd place for the first subtask and the 37th place for the second subtask.
arXiv Detail & Related papers (2020-09-11T18:36:22Z) - GRIT: Generative Role-filler Transformers for Document-level Event
Entity Extraction [134.5580003327839]
We introduce a generative transformer-based encoder-decoder framework (GRIT) to model context at the document level.
We evaluate our approach on the MUC-4 dataset, and show that our model performs substantially better than prior work.
arXiv Detail & Related papers (2020-08-21T01:07:36Z) - ConvBERT: Improving BERT with Span-based Dynamic Convolution [144.25748617961082]
BERT heavily relies on the global self-attention block and thus suffers large memory footprint and computation cost.
We propose a novel span-based dynamic convolution to replace these self-attention heads to directly model local dependencies.
The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning.
arXiv Detail & Related papers (2020-08-06T07:43:19Z) - Yseop at SemEval-2020 Task 5: Cascaded BERT Language Model for
Counterfactual Statement Analysis [0.0]
We use a BERT base model for the classification task and build a hybrid BERT Multi-Layer Perceptron system to handle the sequence identification task.
Our experiments show that while introducing syntactic and semantic features does little in improving the system in the classification task, using these types of features as cascaded linear inputs to fine-tune the sequence-delimiting ability of the model ensures it outperforms other similar-purpose complex systems like BiLSTM-CRF in the second task.
arXiv Detail & Related papers (2020-05-18T08:19:18Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.