Syntactic Knowledge via Graph Attention with BERT in Machine Translation
- URL: http://arxiv.org/abs/2305.13413v1
- Date: Mon, 22 May 2023 18:56:14 GMT
- Title: Syntactic Knowledge via Graph Attention with BERT in Machine Translation
- Authors: Yuqian Dai, Serge Sharoff, Marc de Kamps
- Abstract summary: We propose Syntactic knowledge via Graph attention with BERT (SGB) in Machine Translation (MT) scenarios.
Our experiments use gold syntax-annotation sentences and Quality Estimation (QE) model to obtain interpretability of translation quality improvement.
Experiments show that the proposed SGB engines improve translation quality across the three MT tasks without sacrificing BLEU scores.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although the Transformer model can effectively acquire context features via a
self-attention mechanism, deeper syntactic knowledge is still not effectively
modeled. To alleviate the above problem, we propose Syntactic knowledge via
Graph attention with BERT (SGB) in Machine Translation (MT) scenarios. Graph
Attention Network (GAT) and BERT jointly represent syntactic dependency feature
as explicit knowledge of the source language to enrich source language
representations and guide target language generation. Our experiments use gold
syntax-annotation sentences and Quality Estimation (QE) model to obtain
interpretability of translation quality improvement regarding syntactic
knowledge without being limited to a BLEU score. Experiments show that the
proposed SGB engines improve translation quality across the three MT tasks
without sacrificing BLEU scores. We investigate what length of source sentences
benefits the most and what dependencies are better identified by the SGB
engines. We also find that learning of specific dependency relations by GAT can
be reflected in the translation quality containing such relations and that
syntax on the graph leads to new modeling of syntactic aspects of source
sentences in the middle and bottom layers of BERT.
Related papers
- Injecting linguistic knowledge into BERT for Dialogue State Tracking [60.42231674887294]
This paper proposes a method that extracts linguistic knowledge via an unsupervised framework.
We then utilize this knowledge to augment BERT's performance and interpretability in Dialogue State Tracking (DST) tasks.
We benchmark this framework on various DST tasks and observe a notable improvement in accuracy.
arXiv Detail & Related papers (2023-11-27T08:38:42Z) - Leveraging Language Identification to Enhance Code-Mixed Text
Classification [0.7340017786387767]
Existing deep-learning models do not take advantage of the implicit language information in code-mixed text.
Our study aims to improve BERT-based models performance on low-resource Code-Mixed Hindi-English datasets.
arXiv Detail & Related papers (2023-06-08T06:43:10Z) - Multilingual Extraction and Categorization of Lexical Collocations with
Graph-aware Transformers [86.64972552583941]
We put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context.
Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
arXiv Detail & Related papers (2022-05-23T16:47:37Z) - BERT4GCN: Using BERT Intermediate Layers to Augment GCN for Aspect-based
Sentiment Classification [2.982218441172364]
Graph-based Sentiment Classification (ABSC) approaches have yielded state-of-the-art results, expecially when equipped with contextual word embedding from pre-training language models (PLMs)
We propose a novel model, BERT4GCN, which integrates the grammatical sequential features from the PLM of BERT, and the syntactic knowledge from dependency graphs.
arXiv Detail & Related papers (2021-10-01T02:03:43Z) - KELM: Knowledge Enhanced Pre-Trained Language Representations with
Message Passing on Hierarchical Relational Graphs [26.557447199727758]
We propose a novel knowledge-aware language model framework based on fine-tuning process.
Our model can efficiently incorporate world knowledge from KGs into existing language models such as BERT.
arXiv Detail & Related papers (2021-09-09T12:39:17Z) - KI-BERT: Infusing Knowledge Context for Better Language and Domain
Understanding [0.0]
We propose a technique to infuse knowledge context from knowledge graphs for conceptual and ambiguous entities into models based on transformer architecture.
Our novel technique project knowledge graph embedding in the homogeneous vector-space, introduces new token-types for entities, align entity position ids, and a selective attention mechanism.
We take BERT as a baseline model and implement "KnowledgeInfused BERT" by infusing knowledge context from ConceptNet and WordNet.
arXiv Detail & Related papers (2021-04-09T16:15:31Z) - ERICA: Improving Entity and Relation Understanding for Pre-trained
Language Models via Contrastive Learning [97.10875695679499]
We propose a novel contrastive learning framework named ERICA in pre-training phase to obtain a deeper understanding of the entities and their relations in text.
Experimental results demonstrate that our proposed ERICA framework achieves consistent improvements on several document-level language understanding tasks.
arXiv Detail & Related papers (2020-12-30T03:35:22Z) - GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and
Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations.
GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree.
We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.