Exploiting Word Semantics to Enrich Character Representations of Chinese
Pre-trained Models
- URL: http://arxiv.org/abs/2207.05928v1
- Date: Wed, 13 Jul 2022 02:28:08 GMT
- Title: Exploiting Word Semantics to Enrich Character Representations of Chinese
Pre-trained Models
- Authors: Wenbiao Li, Rui Sun, Yunfang Wu
- Abstract summary: We propose a new method to exploit word structure and integrate lexical semantics into character representations of pre-trained models.
We show that our approach achieves superior performance over the basic pre-trained models BERT, BERT-wwm and ERNIE on different Chinese NLP tasks.
- Score: 12.0190584907439
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of the Chinese pre-trained models adopt characters as basic units for
downstream tasks. However, these models ignore the information carried by words
and thus lead to the loss of some important semantics. In this paper, we
propose a new method to exploit word structure and integrate lexical semantics
into character representations of pre-trained models. Specifically, we project
a word's embedding into its internal characters' embeddings according to the
similarity weight. To strengthen the word boundary information, we mix the
representations of the internal characters within a word. After that, we apply
a word-to-character alignment attention mechanism to emphasize important
characters by masking unimportant ones. Moreover, in order to reduce the error
propagation caused by word segmentation, we present an ensemble approach to
combine segmentation results given by different tokenizers. The experimental
results show that our approach achieves superior performance over the basic
pre-trained models BERT, BERT-wwm and ERNIE on different Chinese NLP tasks:
sentiment classification, sentence pair matching, natural language inference
and machine reading comprehension. We make further analysis to prove the
effectiveness of each component of our model.
Related papers
- Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Assessing Word Importance Using Models Trained for Semantic Tasks [0.0]
We derive word significance from models trained to solve semantic task: Natural Language Inference and Paraphrase Identification.
We evaluate their relevance using a so-called cross-task evaluation.
Our method can be used to identify important words in sentences without any explicit word importance labeling in training.
arXiv Detail & Related papers (2023-05-31T09:34:26Z) - Inducing Character-level Structure in Subword-based Language Models with
Type-level Interchange Intervention Training [36.19870483966741]
We develop a causal intervention framework to learn robust and interpretable character representations inside subword-based language models.
Our method treats each character as a typed variable in a causal model and learns such causal structures.
We additionally introduce a suite of character-level tasks that systematically vary in their dependence on meaning and sequence-level context.
arXiv Detail & Related papers (2022-12-19T22:37:46Z) - Models In a Spelling Bee: Language Models Implicitly Learn the Character
Composition of Tokens [22.55706811131828]
We probe the embedding layer of pretrained language models.
We show that models learn the internal character composition of whole word and subword tokens.
arXiv Detail & Related papers (2021-08-25T11:48:05Z) - More Than Words: Collocation Tokenization for Latent Dirichlet
Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ.
We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z) - Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese
Pre-trained Language Models [62.41139712595334]
We propose a novel pre-training paradigm for Chinese -- Lattice-BERT.
We construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers.
We show that our model can bring an average increase of 1.5% under the 12-layer setting.
arXiv Detail & Related papers (2021-04-15T02:36:49Z) - Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data.
We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations.
Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z) - LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short
Text Matching [29.318730227080675]
We introduce HowNet as an external knowledge base and propose a Linguistic knowledge Enhanced graph Transformer (LET) to deal with word ambiguity.
Experimental results on two Chinese datasets show that our models outperform various typical text matching approaches.
arXiv Detail & Related papers (2021-02-25T04:01:51Z) - CharBERT: Character-aware Pre-trained Language Model [36.9333890698306]
We propose a character-aware pre-trained language model named CharBERT.
We first construct the contextual word embedding for each token from the sequential character representations.
We then fuse the representations of characters and the subword representations by a novel heterogeneous interaction module.
arXiv Detail & Related papers (2020-11-03T07:13:06Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.