Chinese Sequence Labeling with Semi-Supervised Boundary-Aware Language Model Pre-training
- URL: http://arxiv.org/abs/2404.05560v1
- Date: Mon, 8 Apr 2024 14:32:52 GMT
- Title: Chinese Sequence Labeling with Semi-Supervised Boundary-Aware Language Model Pre-training
- Authors: Longhui Zhang, Dingkun Long, Meishan Zhang, Yanzhao Zhang, Pengjun Xie, Min Zhang,
- Abstract summary: Current pre-trained language models rarely explicitly incorporate boundary information into the modeling process.
BABERT incorporates unsupervised statistical boundary information into Chinese BERT's pre-training objectives.
We introduce a novel Boundary Information Metric'' that is both simple and effective.
- Score: 45.40634271936031
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chinese sequence labeling tasks are heavily reliant on accurate word boundary demarcation. Although current pre-trained language models (PLMs) have achieved substantial gains on these tasks, they rarely explicitly incorporate boundary information into the modeling process. An exception to this is BABERT, which incorporates unsupervised statistical boundary information into Chinese BERT's pre-training objectives. Building upon this approach, we input supervised high-quality boundary information to enhance BABERT's learning, developing a semi-supervised boundary-aware PLM. To assess PLMs' ability to encode boundaries, we introduce a novel ``Boundary Information Metric'' that is both simple and effective. This metric allows comparison of different PLMs without task-specific fine-tuning. Experimental results on Chinese sequence labeling datasets demonstrate that the improved BABERT variant outperforms the vanilla version, not only on these tasks but also more broadly across a range of Chinese natural language understanding tasks. Additionally, our proposed metric offers a convenient and accurate means of evaluating PLMs' boundary awareness.
Related papers
- End-to-end Planner Training for Language Modeling [22.555504014437915]
A successful approach to enhance language modeling uses a separate planning module to predict abstract labels of future sentences.
We propose an effective method to improve this approach by enabling joint fine-tuning of the planner and the LM.
arXiv Detail & Related papers (2024-10-16T12:14:29Z) - Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection.
We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z) - Zero-shot prompt-based classification: topic labeling in times of foundation models in German Tweets [1.734165485480267]
We propose a new tool for automatically annotating text using written guidelines without providing training samples.
Our results show that the prompt-based approach is comparable with the fine-tuned BERT but without any annotated training data.
Our findings emphasize the ongoing paradigm shift in the NLP landscape, i.e., the unification of downstream tasks and elimination of the need for pre-labeled training data.
arXiv Detail & Related papers (2024-06-26T10:44:02Z) - TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - Noise-Robust Fine-Tuning of Pretrained Language Models via External
Guidance [61.809732058101304]
We introduce an innovative approach for fine-tuning PLMs using noisy labels.
This approach incorporates the guidance of Large Language Models (LLMs) like ChatGPT.
This guidance assists in accurately distinguishing between clean and noisy samples.
arXiv Detail & Related papers (2023-11-02T09:20:38Z) - Unsupervised Boundary-Aware Language Model Pretraining for Chinese
Sequence Labeling [25.58155857967128]
Boundary information is critical for various Chinese language processing tasks, such as word segmentation, part-of-speech tagging, and named entity recognition.
We propose an architecture to encode the information directly into pre-trained language models, resulting in Boundary-Aware BERT (BABERT)
Experimental results on ten benchmarks of Chinese sequence labeling demonstrate that BABERT can provide consistent improvements on all datasets.
arXiv Detail & Related papers (2022-10-27T07:38:50Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning [19.682704309037653]
Masked language models (MLMs) have revolutionized the field of Natural Language Understanding.
We propose TaCL (Token-aware Contrastive Learning), a novel continual pre-training approach that encourages BERT to learn an isotropic and discriminative distribution of token representations.
arXiv Detail & Related papers (2021-11-07T22:54:23Z) - Incorporating BERT into Parallel Sequence Decoding with Adapters [82.65608966202396]
We propose to take two different BERT models as the encoder and decoder respectively, and fine-tune them by introducing simple and lightweight adapter modules.
We obtain a flexible and efficient model which is able to jointly leverage the information contained in the source-side and target-side BERT models.
Our framework is based on a parallel sequence decoding algorithm named Mask-Predict considering the bi-directional and conditional independent nature of BERT.
arXiv Detail & Related papers (2020-10-13T03:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.