Related papers: Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models

Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models

URL: http://arxiv.org/abs/2104.07204v1
Date: Thu, 15 Apr 2021 02:36:49 GMT
Title: Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models
Authors: Yuxuan Lai, Yijia Liu, Yansong Feng, Songfang Huang and Dongyan Zhao
Abstract summary: We propose a novel pre-training paradigm for Chinese -- Lattice-BERT. We construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers. We show that our model can bring an average increase of 1.5% under the 12-layer setting.
Score: 62.41139712595334
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chinese pre-trained language models usually process text as a sequence of characters, while ignoring more coarse granularity, e.g., words. In this work, we propose a novel pre-training paradigm for Chinese -- Lattice-BERT, which explicitly incorporates word representations along with characters, thus can model a sentence in a multi-granularity manner. Specifically, we construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers. We design a lattice position attention mechanism to exploit the lattice structures in self-attention layers. We further propose a masked segment prediction task to push the model to learn from rich but redundant information inherent in lattices, while avoiding learning unexpected tricks. Experiments on 11 Chinese natural language understanding tasks show that our model can bring an average increase of 1.5% under the 12-layer setting, which achieves new state-of-the-art among base-size models on the CLUE benchmarks. Further analysis shows that Lattice-BERT can harness the lattice structures, and the improvement comes from the exploration of redundant information and multi-granularity representations. Our code will be available at https://github.com/alibaba/pretrained-language-models/LatticeBERT.

Related papers

Large Concept Models: Language Modeling in a Sentence Representation Space [62.73366944266477]
We present an attempt at an architecture which operates on an explicit higher-level semantic representation, which we name a concept. Concepts are language- and modality-agnostic and represent a higher level idea or action in a flow. We show that our model exhibits impressive zero-shot generalization performance to many languages.
arXiv Detail & Related papers (2024-12-11T23:36:20Z)
Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction [23.45902601618188]
Language models have demonstrated impressive ability in context understanding and generative performance. We propose LMTraj (Language-based Multimodal Trajectory predictor), which recasts the trajectory prediction task into a sort of question-answering problem. We show that the language-based model can be a powerful pedestrian trajectory predictor, and outperforms existing numerical-based predictor methods.
arXiv Detail & Related papers (2024-03-27T11:06:44Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
Hidden Schema Networks [3.4123736336071864]
We introduce a novel neural language model that enforces, via inductive biases, explicit relational structures. The model encodes sentences into sequences of symbols, which correspond to nodes visited by biased random walkers. We show that the model is able to uncover ground-truth graphs from artificially generated datasets of random token sequences.
arXiv Detail & Related papers (2022-07-08T09:26:19Z)
TunBERT: Pretrained Contextualized Text Representation for Tunisian Dialect [0.0]
We investigate the feasibility of training monolingual Transformer-based language models for under represented languages. We show that the use of noisy web crawled data instead of structured data is more convenient for such non-standardized language. Our best performing TunBERT model reaches or improves the state-of-the-art in all three downstream tasks.
arXiv Detail & Related papers (2021-11-25T15:49:50Z)
LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching [29.318730227080675]
We introduce HowNet as an external knowledge base and propose a Linguistic knowledge Enhanced graph Transformer (LET) to deal with word ambiguity. Experimental results on two Chinese datasets show that our models outperform various typical text matching approaches.
arXiv Detail & Related papers (2021-02-25T04:01:51Z)
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks. Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages. We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z)
SLM: Learning a Discourse Language Representation with Sentence Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation. We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
KR-BERT: A Small-Scale Korean-Specific Language Model [0.0]
We trained a Korean-specific model KR-BERT, utilizing a smaller vocabulary and dataset. Our model performed comparably and even better than other existing pre-trained models using a corpus about 1/10 of the size.
arXiv Detail & Related papers (2020-08-10T09:26:00Z)
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training. We propose a new pre-training task based on contrastive learning. By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.