Related papers: Syntactic Knowledge via Graph Attention with BERT in Machine Translation

Syntactic Knowledge via Graph Attention with BERT in Machine Translation

URL: http://arxiv.org/abs/2305.13413v1
Date: Mon, 22 May 2023 18:56:14 GMT
Title: Syntactic Knowledge via Graph Attention with BERT in Machine Translation
Authors: Yuqian Dai, Serge Sharoff, Marc de Kamps
Abstract summary: We propose Syntactic knowledge via Graph attention with BERT (SGB) in Machine Translation (MT) scenarios. Our experiments use gold syntax-annotation sentences and Quality Estimation (QE) model to obtain interpretability of translation quality improvement. Experiments show that the proposed SGB engines improve translation quality across the three MT tasks without sacrificing BLEU scores.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although the Transformer model can effectively acquire context features via a self-attention mechanism, deeper syntactic knowledge is still not effectively modeled. To alleviate the above problem, we propose Syntactic knowledge via Graph attention with BERT (SGB) in Machine Translation (MT) scenarios. Graph Attention Network (GAT) and BERT jointly represent syntactic dependency feature as explicit knowledge of the source language to enrich source language representations and guide target language generation. Our experiments use gold syntax-annotation sentences and Quality Estimation (QE) model to obtain interpretability of translation quality improvement regarding syntactic knowledge without being limited to a BLEU score. Experiments show that the proposed SGB engines improve translation quality across the three MT tasks without sacrificing BLEU scores. We investigate what length of source sentences benefits the most and what dependencies are better identified by the SGB engines. We also find that learning of specific dependency relations by GAT can be reflected in the translation quality containing such relations and that syntax on the graph leads to new modeling of syntactic aspects of source sentences in the middle and bottom layers of BERT.

Related papers

AxBERT: An Interpretable Chinese Spelling Correction Method Driven by Associative Knowledge Network [7.114174944371803]
AxBERT is proposed for Chinese spelling correction by aligning with an associative knowledge network (AKN) Our interpretable analysis, together with qualitative reasoning, can effectively illustrate the interpretability of AxBERT.
arXiv Detail & Related papers (2025-03-04T04:09:10Z)
Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation [55.73341401764367]
We introduce ADSQE, a novel framework for alleviating distribution shift in synthetic QE data. ADSQE uses references, i.e., translation supervision signals, to guide both the generation and annotation processes. Experiments demonstrate that ADSQE outperforms SOTA baselines like COMET in both supervised and unsupervised settings.
arXiv Detail & Related papers (2025-02-27T10:11:53Z)
Injecting linguistic knowledge into BERT for Dialogue State Tracking [60.42231674887294]
This paper proposes a method that extracts linguistic knowledge via an unsupervised framework. We then utilize this knowledge to augment BERT's performance and interpretability in Dialogue State Tracking (DST) tasks. We benchmark this framework on various DST tasks and observe a notable improvement in accuracy.
arXiv Detail & Related papers (2023-11-27T08:38:42Z)
Leveraging Language Identification to Enhance Code-Mixed Text Classification [0.7340017786387767]
Existing deep-learning models do not take advantage of the implicit language information in code-mixed text. Our study aims to improve BERT-based models performance on low-resource Code-Mixed Hindi-English datasets.
arXiv Detail & Related papers (2023-06-08T06:43:10Z)
Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers [86.64972552583941]
We put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context. Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
arXiv Detail & Related papers (2022-05-23T16:47:37Z)
BERT4GCN: Using BERT Intermediate Layers to Augment GCN for Aspect-based Sentiment Classification [2.982218441172364]
Graph-based Sentiment Classification (ABSC) approaches have yielded state-of-the-art results, expecially when equipped with contextual word embedding from pre-training language models (PLMs) We propose a novel model, BERT4GCN, which integrates the grammatical sequential features from the PLM of BERT, and the syntactic knowledge from dependency graphs.
arXiv Detail & Related papers (2021-10-01T02:03:43Z)
KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs [26.557447199727758]
We propose a novel knowledge-aware language model framework based on fine-tuning process. Our model can efficiently incorporate world knowledge from KGs into existing language models such as BERT.
arXiv Detail & Related papers (2021-09-09T12:39:17Z)
KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding [0.0]
We propose a technique to infuse knowledge context from knowledge graphs for conceptual and ambiguous entities into models based on transformer architecture. Our novel technique project knowledge graph embedding in the homogeneous vector-space, introduces new token-types for entities, align entity position ids, and a selective attention mechanism. We take BERT as a baseline model and implement "KnowledgeInfused BERT" by infusing knowledge context from ConceptNet and WordNet.
arXiv Detail & Related papers (2021-04-09T16:15:31Z)
ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text. Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z)
GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations. GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree. We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z)
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z)
Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining. We distill the approximate marginal distribution over words in context from the syntactic LM. Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.