Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese
Pre-trained Language Models
- URL: http://arxiv.org/abs/2104.07204v1
- Date: Thu, 15 Apr 2021 02:36:49 GMT
- Title: Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese
Pre-trained Language Models
- Authors: Yuxuan Lai, Yijia Liu, Yansong Feng, Songfang Huang and Dongyan Zhao
- Abstract summary: We propose a novel pre-training paradigm for Chinese -- Lattice-BERT.
We construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers.
We show that our model can bring an average increase of 1.5% under the 12-layer setting.
- Score: 62.41139712595334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chinese pre-trained language models usually process text as a sequence of
characters, while ignoring more coarse granularity, e.g., words. In this work,
we propose a novel pre-training paradigm for Chinese -- Lattice-BERT, which
explicitly incorporates word representations along with characters, thus can
model a sentence in a multi-granularity manner. Specifically, we construct a
lattice graph from the characters and words in a sentence and feed all these
text units into transformers. We design a lattice position attention mechanism
to exploit the lattice structures in self-attention layers. We further propose
a masked segment prediction task to push the model to learn from rich but
redundant information inherent in lattices, while avoiding learning unexpected
tricks. Experiments on 11 Chinese natural language understanding tasks show
that our model can bring an average increase of 1.5% under the 12-layer
setting, which achieves new state-of-the-art among base-size models on the CLUE
benchmarks. Further analysis shows that Lattice-BERT can harness the lattice
structures, and the improvement comes from the exploration of redundant
information and multi-granularity representations. Our code will be available
at https://github.com/alibaba/pretrained-language-models/LatticeBERT.
Related papers
- Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction [23.45902601618188]
Language models have demonstrated impressive ability in context understanding and generative performance.
We propose LMTraj (Language-based Multimodal Trajectory predictor), which recasts the trajectory prediction task into a sort of question-answering problem.
We show that the language-based model can be a powerful pedestrian trajectory predictor, and outperforms existing numerical-based predictor methods.
arXiv Detail & Related papers (2024-03-27T11:06:44Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Hidden Schema Networks [3.4123736336071864]
We introduce a novel neural language model that enforces, via inductive biases, explicit relational structures.
The model encodes sentences into sequences of symbols, which correspond to nodes visited by biased random walkers.
We show that the model is able to uncover ground-truth graphs from artificially generated datasets of random token sequences.
arXiv Detail & Related papers (2022-07-08T09:26:19Z) - TunBERT: Pretrained Contextualized Text Representation for Tunisian
Dialect [0.0]
We investigate the feasibility of training monolingual Transformer-based language models for under represented languages.
We show that the use of noisy web crawled data instead of structured data is more convenient for such non-standardized language.
Our best performing TunBERT model reaches or improves the state-of-the-art in all three downstream tasks.
arXiv Detail & Related papers (2021-11-25T15:49:50Z) - LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short
Text Matching [29.318730227080675]
We introduce HowNet as an external knowledge base and propose a Linguistic knowledge Enhanced graph Transformer (LET) to deal with word ambiguity.
Experimental results on two Chinese datasets show that our models outperform various typical text matching approaches.
arXiv Detail & Related papers (2021-02-25T04:01:51Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - KR-BERT: A Small-Scale Korean-Specific Language Model [0.0]
We trained a Korean-specific model KR-BERT, utilizing a smaller vocabulary and dataset.
Our model performed comparably and even better than other existing pre-trained models using a corpus about 1/10 of the size.
arXiv Detail & Related papers (2020-08-10T09:26:00Z) - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language
Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training.
We propose a new pre-training task based on contrastive learning.
By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.