BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking
Framework for Definition Modeling
- URL: http://arxiv.org/abs/2204.07701v1
- Date: Sat, 16 Apr 2022 02:33:28 GMT
- Title: BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking
Framework for Definition Modeling
- Authors: Cunliang Kong, Yujie Wang, Ruining Chong, Liner Yang, Hengyuan Zhang,
Erhong Yang, Yaping Huang
- Abstract summary: This paper describes the BLCU-ICALL system used in the SemEval-2022 Task 1 Comparing Dictionaries and Word Embeddings.
We propose a transformer-based multitasking framework to explore the task.
- Score: 16.794041736487323
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes the BLCU-ICALL system used in the SemEval-2022 Task 1
Comparing Dictionaries and Word Embeddings, the Definition Modeling subtrack,
achieving 1st on Italian, 2nd on Spanish and Russian, and 3rd on English and
French. We propose a transformer-based multitasking framework to explore the
task. The framework integrates multiple embedding architectures through the
cross-attention mechanism, and captures the structure of glosses through a
masking language model objective. Additionally, we also investigate a simple
but effective model ensembling strategy to further improve the robustness. The
evaluation results show the effectiveness of our solution. We release our code
at: https://github.com/blcuicall/SemEval2022-Task1-DM.
Related papers
- Temporal and Semantic Evaluation Metrics for Foundation Models in Post-Hoc Analysis of Robotic Sub-tasks [1.8124328823188356]
We present an automated framework to decompose trajectory data into temporally bounded and natural language-based descriptive sub-tasks.
Our framework provides both time-based and language-based descriptions for lower-level sub-tasks that comprise full trajectories.
The metrics measure the temporal alignment and semantic fidelity of language descriptions between two sub-task decompositions.
arXiv Detail & Related papers (2024-03-25T22:39:20Z) - Multitask Multimodal Prompted Training for Interactive Embodied Task
Completion [48.69347134411864]
Embodied MultiModal Agent (EMMA) is a unified encoder-decoder model that reasons over images and trajectories.
By unifying all tasks as text generation, EMMA learns a language of actions which facilitates transfer across tasks.
arXiv Detail & Related papers (2023-11-07T15:27:52Z) - SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented
Dialogue with Symbolic Scene Representation [2.4469484645516837]
SimpleMTOD recasts several sub-tasks in multimodal task-oriented dialogues as sequence prediction tasks.
We introduce both local and de-localized tokens for objects within a scene.
The model does not rely on task-specific architectural changes such as classification heads.
arXiv Detail & Related papers (2023-07-10T21:16:46Z) - Improving Cross-task Generalization of Unified Table-to-text Models with
Compositional Task Configurations [63.04466647849211]
Methods typically encode task information with a simple dataset name as a prefix to the encoder.
We propose compositional task configurations, a set of prompts prepended to the encoder to improve cross-task generalization.
We show this not only allows the model to better learn shared knowledge across different tasks at training, but also allows us to control the model by composing new configurations.
arXiv Detail & Related papers (2022-12-17T02:20:14Z) - Image as a Foreign Language: BEiT Pretraining for All Vision and
Vision-Language Tasks [87.6494641931349]
We introduce a general-purpose multimodal foundation model BEiT-3.
It achieves state-of-the-art transfer performance on both vision and vision-language tasks.
arXiv Detail & Related papers (2022-08-22T16:55:04Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Unifying Architectures, Tasks, and Modalities Through a Simple
Sequence-to-Sequence Learning Framework [83.82026345508334]
We propose OFA, a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.)
OFA achieves new state-of-the-arts on a series of multimodal tasks, including image captioning (COCO test CIDEr: 149.6), text-to-image generation (COCO test FID: 10.5), VQA (test-std encoder acc.: 80.02), SNLI-VE (test acc.: 90.
arXiv Detail & Related papers (2022-02-07T10:38:21Z) - MCL@IITK at SemEval-2021 Task 2: Multilingual and Cross-lingual
Word-in-Context Disambiguation using Augmented Data, Signals, and
Transformers [1.869621561196521]
We present our approach for solving the SemEval 2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC)
The goal is to detect whether a given word common to both the sentences evokes the same meaning.
We submit systems for both the settings - Multilingual and Cross-Lingual.
arXiv Detail & Related papers (2021-04-04T08:49:28Z) - ReCAM@IITK at SemEval-2021 Task 4: BERT and ALBERT based Ensemble for
Abstract Word Prediction [2.482368922343792]
We fine-tuned the pre-trained masked language models namely BERT and ALBERT.
We tried multiple approaches and found that Masked Language Modeling(MLM) based approach works the best.
arXiv Detail & Related papers (2021-04-04T08:22:19Z) - Unifying Vision-and-Language Tasks via Text Generation [81.3910771082967]
We propose a unified framework that learns different tasks in a single architecture.
Our models learn to generate labels in text based on the visual and textual inputs.
Our generative approach shows better generalization ability on answering questions that have rare answers.
arXiv Detail & Related papers (2021-02-04T17:59:30Z) - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language
Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training.
We propose a new pre-training task based on contrastive learning.
By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.