AxBERT: An Interpretable Chinese Spelling Correction Method Driven by Associative Knowledge Network
- URL: http://arxiv.org/abs/2503.02255v1
- Date: Tue, 04 Mar 2025 04:09:10 GMT
- Title: AxBERT: An Interpretable Chinese Spelling Correction Method Driven by Associative Knowledge Network
- Authors: Fanyu Wang, Hangyu Zhu, Zhenping Xie,
- Abstract summary: AxBERT is proposed for Chinese spelling correction by aligning with an associative knowledge network (AKN)<n>Our interpretable analysis, together with qualitative reasoning, can effectively illustrate the interpretability of AxBERT.
- Score: 7.114174944371803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning has shown promising performance on various machine learning tasks. Nevertheless, the uninterpretability of deep learning models severely restricts the usage domains that require feature explanations, such as text correction. Therefore, a novel interpretable deep learning model (named AxBERT) is proposed for Chinese spelling correction by aligning with an associative knowledge network (AKN). Wherein AKN is constructed based on the co-occurrence relations among Chinese characters, which denotes the interpretable statistic logic contrasted with uninterpretable BERT logic. And a translator matrix between BERT and AKN is introduced for the alignment and regulation of the attention component in BERT. In addition, a weight regulator is designed to adjust the attention distributions in BERT to appropriately model the sentence semantics. Experimental results on SIGHAN datasets demonstrate that AxBERT can achieve extraordinary performance, especially upon model precision compared to baselines. Our interpretable analysis, together with qualitative reasoning, can effectively illustrate the interpretability of AxBERT.
Related papers
- Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation [55.73341401764367]
We introduce ADSQE, a novel framework for alleviating distribution shift in synthetic QE data.
ADSQE uses references, i.e., translation supervision signals, to guide both the generation and annotation processes.
Experiments demonstrate that ADSQE outperforms SOTA baselines like COMET in both supervised and unsupervised settings.
arXiv Detail & Related papers (2025-02-27T10:11:53Z) - Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing.
We present TP-BERTa, a specifically pre-trained LM for tabular data prediction.
A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z) - Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic [51.967603572656266]
We introduce a consistent and theoretically grounded approach to annotating decompositional entailment.
We find that our new dataset, RDTE, has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets.
We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in an entailment tree reasoning engine significantly improves both accuracy and proof quality.
arXiv Detail & Related papers (2024-02-22T18:55:17Z) - Syntactic Knowledge via Graph Attention with BERT in Machine Translation [0.0]
We propose Syntactic knowledge via Graph attention with BERT (SGB) in Machine Translation (MT) scenarios.
Our experiments use gold syntax-annotation sentences and Quality Estimation (QE) model to obtain interpretability of translation quality improvement.
Experiments show that the proposed SGB engines improve translation quality across the three MT tasks without sacrificing BLEU scores.
arXiv Detail & Related papers (2023-05-22T18:56:14Z) - A Unified Neural Network Model for Readability Assessment with Feature
Projection and Length-Balanced Loss [17.213602354715956]
We propose a BERT-based model with feature projection and length-balanced loss for readability assessment.
Our model achieves state-of-the-art performances on two English benchmark datasets and one dataset of Chinese textbooks.
arXiv Detail & Related papers (2022-10-19T05:33:27Z) - Interpretable Mixture of Experts [71.55701784196253]
Interpretable Mixture of Experts (IME) is an inherently-interpretable modeling framework.
IME is demonstrated to be more accurate than single interpretable models and perform comparably with existing state-of-the-art Deep Neural Networks (DNNs) in accuracy.
IME's explanations are compared to commonly-used post-hoc explanations methods through a user study.
arXiv Detail & Related papers (2022-06-05T06:40:15Z) - Improving Contextual Representation with Gloss Regularized Pre-training [9.589252392388758]
We propose an auxiliary gloss regularizer module to BERT pre-training (GR-BERT) to enhance word semantic similarity.
By predicting masked words and aligning contextual embeddings to corresponding glosses simultaneously, the word similarity can be explicitly modeled.
Experimental results show that the gloss regularizer benefits BERT in word-level and sentence-level semantic representation.
arXiv Detail & Related papers (2022-05-13T12:50:32Z) - KELM: Knowledge Enhanced Pre-Trained Language Representations with
Message Passing on Hierarchical Relational Graphs [26.557447199727758]
We propose a novel knowledge-aware language model framework based on fine-tuning process.
Our model can efficiently incorporate world knowledge from KGs into existing language models such as BERT.
arXiv Detail & Related papers (2021-09-09T12:39:17Z) - Exploring the Role of BERT Token Representations to Explain Sentence
Probing Results [15.652077779677091]
We show that BERT tends to encode meaningful knowledge in specific token representations.
This allows the model to detect syntactic and semantic abnormalities and to distinctively separate grammatical number and tense subspaces.
arXiv Detail & Related papers (2021-04-03T20:40:42Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.