SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition
using Large Pre-trained Language Models
- URL: http://arxiv.org/abs/2205.14660v1
- Date: Sun, 29 May 2022 13:40:14 GMT
- Title: SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition
using Large Pre-trained Language Models
- Authors: Changyu Hou, Jun Wang, Yixuan Qiao, Peng Jiang, Peng Gao, Guotong Xie,
Qizhi Lin, Xiaopeng Wang, Xiandi Jiang, Benqi Wang, Qifeng Xiao
- Abstract summary: This paper describes our NER system in the SemEval 2022 task11: MultiCoNER.
By assigning different weights to each model for different inputs, we adopted the Transformer layer to integrate the advantages of diverse models effectively.
Experimental results show that our method achieves superior performances in Farsi and Dutch.
- Score: 14.94542859759424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large scale pre-training models have been widely used in named entity
recognition (NER) tasks. However, model ensemble through parameter averaging or
voting can not give full play to the differentiation advantages of different
models, especially in the open domain. This paper describes our NER system in
the SemEval 2022 task11: MultiCoNER. We proposed an effective system to
adaptively ensemble pre-trained language models by a Transformer layer. By
assigning different weights to each model for different inputs, we adopted the
Transformer layer to integrate the advantages of diverse models effectively.
Experimental results show that our method achieves superior performances in
Farsi and Dutch.
Related papers
- Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications.
MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling.
Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z) - Fisher Mask Nodes for Language Model Merging [0.0]
We introduce a novel model merging method for Transformers, combining insights from previous work in Fisher-weighted averaging and the use of Fisher information in model pruning.
Our method exhibits a regular and significant performance increase across various models in the BERT family, outperforming full-scale Fisher-weighted averaging in a fraction of the computational cost.
arXiv Detail & Related papers (2024-03-14T21:52:26Z) - Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent [2.3967405016776384]
Jack of All Trades (JAT) is a transformer-based model with a unique design optimized for handling sequential decision-making tasks.
JAT is the first model of its kind to be fully open-sourced at https://huggingface.co/jat-project/jat, including a pioneering general-purpose dataset.
arXiv Detail & Related papers (2024-02-15T10:01:55Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - HyperTransformer: Model Generation for Supervised and Semi-Supervised
Few-Shot Learning [14.412066456583917]
We propose a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples.
Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal.
We extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.
arXiv Detail & Related papers (2022-01-11T20:15:35Z) - Scalable and Efficient MoE Training for Multitask Multilingual Models [55.987536562357086]
We develop a system capable of scaling MoE models efficiently to trillions of parameters.
We also present new training methods to improve MoE sample efficiency and leverage expert pruning strategy to improve time efficiency.
A model trained with 10 billion parameters on 50 languages can achieve state-of-the-art performance in Machine Translation (MT) and multilingual natural language generation tasks.
arXiv Detail & Related papers (2021-09-22T00:57:46Z) - TEASEL: A Transformer-Based Speech-Prefixed Language Model [4.014524824655106]
Multimodal language analysis aims to simultaneously model a speaker's words, acoustical annotations, and facial expressions.
lexicon features usually outperform other modalities because they are pre-trained on large corpora via Transformer-based models.
Despite their strong performance, training a new self-supervised learning (SSL) Transformer on any modality is not usually attainable due to insufficient data.
arXiv Detail & Related papers (2021-09-12T14:08:57Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.