Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language
Model Erlangshen with Propensity-Corrected Loss
- URL: http://arxiv.org/abs/2208.02959v1
- Date: Fri, 5 Aug 2022 02:52:29 GMT
- Title: Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language
Model Erlangshen with Propensity-Corrected Loss
- Authors: Junjie Wang, Yuxiang Zhang, Ping Yang, Ruyi Gan
- Abstract summary: This report describes a pre-trained language model Erlangshen with propensity-corrected loss.
We construct a dynamic masking strategy based on knowledge in Masked Language Modeling (MLM) with whole word masking.
Overall, we achieve 72.54 points in F1 Score and 78.90 points in Accuracy on the test set.
- Score: 12.034243662298035
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This report describes a pre-trained language model Erlangshen with
propensity-corrected loss, the No.1 in CLUE Semantic Matching Challenge. In the
pre-training stage, we construct a dynamic masking strategy based on knowledge
in Masked Language Modeling (MLM) with whole word masking. Furthermore, by
observing the specific structure of the dataset, the pre-trained Erlangshen
applies propensity-corrected loss (PCL) in the fine-tuning phase. Overall, we
achieve 72.54 points in F1 Score and 78.90 points in Accuracy on the test set.
Our code is publicly available at:
https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/hf-ds/fengshen/examples/clue_sim.
Related papers
- Team "better_call_claude": Style Change Detection using a Sequential Sentence Pair Classifier [5.720553544629197]
At PAN 2025, the shared task challenges participants to detect style at the most fine-grained level: individual sentences.<n>We propose to address this problem by modeling the content of each instance using a Sentence Pair Pair (SSPC) architecture.<n>The model achieves strong macro macro scores of 0.92328, and 0.724 on the EASY MEDIUM, and HARD data, respectively.
arXiv Detail & Related papers (2025-08-01T14:48:17Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Underspecification in Language Modeling Tasks: A Causality-Informed
Study of Gendered Pronoun Resolution [0.0]
We introduce a simple causal mechanism to describe the role underspecification plays in the generation of spurious correlations.
Despite its simplicity, our causal model directly informs the development of two lightweight black-box evaluation methods.
arXiv Detail & Related papers (2022-09-30T23:10:11Z) - uChecker: Masked Pretrained Language Models as Unsupervised Chinese
Spelling Checkers [23.343006562849126]
We propose a framework named textbfuChecker to conduct unsupervised spelling error detection and correction.
Masked pretrained language models such as BERT are introduced as the backbone model.
Benefiting from the various and flexible MASKing operations, we propose a Confusionset-guided masking strategy to fine-train the masked language model.
arXiv Detail & Related papers (2022-09-15T05:57:12Z) - IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation
Identification Through a Prompt-based Few-shot Approach [3.4423596432619754]
We address the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs)
We follow a prompt-based prediction approach for fine-tuning LMs in which the CRI task is treated as a masked language modeling problem (MLM)
We compare the performance of this method against ensemble techniques trained on the entire dataset.
arXiv Detail & Related papers (2022-09-08T16:03:50Z) - Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese
Grammatical Error Correction [49.25830718574892]
We present a new framework named Tail-to-Tail (textbfTtT) non-autoregressive sequence prediction.
Considering that most tokens are correct and can be conveyed directly from source to target, and the error positions can be estimated and corrected.
Experimental results on standard datasets, especially on the variable-length datasets, demonstrate the effectiveness of TtT in terms of sentence-level Accuracy, Precision, Recall, and F1-Measure.
arXiv Detail & Related papers (2021-06-03T05:56:57Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z) - CAPT: Contrastive Pre-Training for Learning Denoised Sequence
Representations [42.86803751871867]
We present ContrAstive Pre-Training (CAPT) to learn noise invariant sequence representations.
CAPT encourages the consistency between representations of the original sequence and its corrupted version via unsupervised instance-wise training signals.
arXiv Detail & Related papers (2020-10-13T13:08:34Z) - Deep F-measure Maximization for End-to-End Speech Understanding [52.36496114728355]
We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.
We perform experiments on two standard fairness datasets, Adult, Communities and Crime, and also on speech-to-intent detection on the ATIS dataset and speech-to-image concept classification on the Speech-COCO dataset.
In all four of these tasks, F-measure results in improved micro-F1 scores, with absolute improvements of up to 8% absolute, as compared to models trained with the cross-entropy loss function.
arXiv Detail & Related papers (2020-08-08T03:02:27Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.