RoBLEURT Submission for the WMT2021 Metrics Task
- URL: http://arxiv.org/abs/2204.13352v1
- Date: Thu, 28 Apr 2022 08:49:40 GMT
- Title: RoBLEURT Submission for the WMT2021 Metrics Task
- Authors: Yu Wan, Dayiheng Liu, Baosong Yang, Tianchi Bi, Haibo Zhang, Boxing
Chen, Weihua Luo, Derek F. Wong, Lidia S. Chao
- Abstract summary: We present our submission to the Shared Metrics Task: RoBLEURT.
Our model reaches state-of-the-art correlations with the WMT 2020 human annotations upon 8 out of 10 to-English language pairs.
- Score: 72.26898579202076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present our submission to Shared Metrics Task: RoBLEURT
(Robustly Optimizing the training of BLEURT). After investigating the recent
advances of trainable metrics, we conclude several aspects of vital importance
to obtain a well-performed metric model by: 1) jointly leveraging the
advantages of source-included model and reference-only model, 2) continuously
pre-training the model with massive synthetic data pairs, and 3) fine-tuning
the model with data denoising strategy. Experimental results show that our
model reaching state-of-the-art correlations with the WMT2020 human annotations
upon 8 out of 10 to-English language pairs.
Related papers
- Meta-Learning Adaptable Foundation Models [37.458141335750696]
We introduce a meta-learning framework infused with PEFT in this intermediate retraining stage to learn a model that can be easily adapted to unseen tasks.
In this setting, we demonstrate the suboptimality of standard retraining for finding an adaptable set of parameters.
We then apply these theoretical insights to retraining the RoBERTa model to predict the continuation of conversations within the ConvAI2 dataset.
arXiv Detail & Related papers (2024-10-29T17:24:18Z) - SmurfCat at SemEval-2024 Task 6: Leveraging Synthetic Data for Hallucination Detection [51.99159169107426]
We present our novel systems developed for the SemEval-2024 hallucination detection task.
Our investigation spans a range of strategies to compare model predictions with reference standards.
We introduce three distinct methods that exhibit strong performance metrics.
arXiv Detail & Related papers (2024-04-09T09:03:44Z) - Has Your Pretrained Model Improved? A Multi-head Posterior Based
Approach [25.927323251675386]
We leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models.
We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models.
Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models.
arXiv Detail & Related papers (2024-01-02T17:08:26Z) - Cross-Modal Fine-Tuning: Align then Refine [83.37294254884446]
ORCA is a cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities.
We show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities.
arXiv Detail & Related papers (2023-02-11T16:32:28Z) - Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task [61.34108034582074]
We build our system based on the core idea of UNITE (Unified Translation Evaluation)
During the model pre-training phase, we first apply the pseudo-labeled data examples to continuously pre-train UNITE.
During the fine-tuning phase, we use both Direct Assessment (DA) and Multidimensional Quality Metrics (MQM) data from past years' WMT competitions.
arXiv Detail & Related papers (2022-10-18T08:51:25Z) - UNIMIB at TREC 2021 Clinical Trials Track [2.840363325289377]
This contribution summarizes the participation of the UNIMIB team to the TREC 2021 Clinical Trials Track.
We have investigated the effect of different query representations combined with several retrieval models on the retrieval performance.
arXiv Detail & Related papers (2022-07-27T13:39:30Z) - FH-SWF SG at GermEval 2021: Using Transformer-Based Language Models to
Identify Toxic, Engaging, & Fact-Claiming Comments [0.0]
We describe the methods we used for our submissions to the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments.
For all three subtasks we fine-tuned freely available transformer-based models from the Huggingface model hub.
We evaluated the performance of various pre-trained models after fine-tuning on 80% of the training data and submitted predictions of the two best performing resulting models.
arXiv Detail & Related papers (2021-09-07T09:46:27Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Improving Zero and Few-Shot Abstractive Summarization with Intermediate
Fine-tuning and Data Augmentation [101.26235068460551]
Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks.
Models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains.
We introduce a novel and generalizable method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner.
arXiv Detail & Related papers (2020-10-24T08:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.