Constructing Code-mixed Universal Dependency Forest for Unbiased
Cross-lingual Relation Extraction
- URL: http://arxiv.org/abs/2305.12258v3
- Date: Sun, 4 Jun 2023 13:29:51 GMT
- Title: Constructing Code-mixed Universal Dependency Forest for Unbiased
Cross-lingual Relation Extraction
- Authors: Hao Fei, Meishan Zhang, Min Zhang, Tat-Seng Chua
- Abstract summary: Cross-lingual relation extraction (XRE) aggressively leverage the language-consistent structural features from the universal dependency (UD) resource.
We investigate an unbiased UD-based XRE transfer by constructing a type of code-mixed UD forest.
With such forest features, the gaps of UD-based XRE between the training and predicting phases can be effectively closed.
- Score: 92.84968716013783
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Latest efforts on cross-lingual relation extraction (XRE) aggressively
leverage the language-consistent structural features from the universal
dependency (UD) resource, while they may largely suffer from biased transfer
(e.g., either target-biased or source-biased) due to the inevitable linguistic
disparity between languages. In this work, we investigate an unbiased UD-based
XRE transfer by constructing a type of code-mixed UD forest. We first translate
the sentence of the source language to the parallel target-side language, for
both of which we parse the UD tree respectively. Then, we merge the
source-/target-side UD structures as a unified code-mixed UD forest. With such
forest features, the gaps of UD-based XRE between the training and predicting
phases can be effectively closed. We conduct experiments on the ACE XRE
benchmark datasets, where the results demonstrate that the proposed code-mixed
UD forests help unbiased UD-based XRE transfer, with which we achieve
significant XRE performance gains.
Related papers
- Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping [60.458273797431836]
Decoding by contrasting layers (DoLa) is designed to improve the generation quality of large language models.
We find that this approach does not work well on non-English tasks.
Inspired by previous interpretability work on language transition during the model's forward pass, we propose an improved contrastive decoding algorithm.
arXiv Detail & Related papers (2024-07-15T15:14:01Z) - Multilingual Nonce Dependency Treebanks: Understanding how Language Models represent and process syntactic structure [15.564927804136852]
SPUD (Semantically Perturbed Universal Dependencies) is a framework for creating nonce treebanks for the Universal Dependencies (UD) corpora.
We create nonce data in Arabic, English, French, German, and Russian, and demonstrate two use cases of SPUD treebanks.
arXiv Detail & Related papers (2023-11-13T17:36:58Z) - Data Augmentation for Machine Translation via Dependency Subtree
Swapping [0.0]
We present a generic framework for data augmentation via dependency subtree swapping.
We extract corresponding subtrees from the dependency parse trees of the source and target sentences and swap these across bisentences to create augmented samples.
We conduct resource-constrained experiments on 4 language pairs in both directions using the IWSLT text translation datasets and the Hunglish2 corpus.
arXiv Detail & Related papers (2023-07-13T19:00:26Z) - GAUSS: Guided Encoder-Decoder Architecture for Hyperspectral Unmixing
with Spatial Smoothness [0.0]
In recent hyperspectral unmixing (HU) literature, the application of deep learning (DL) has become more prominent.
We propose a split architecture and use a pseudo-ground truth for abundances to guide the unmixing network' (UN) optimization.
arXiv Detail & Related papers (2022-04-16T04:23:47Z) - Separate What You Describe: Language-Queried Audio Source Separation [53.65665794338574]
We introduce the task of language-queried audio source separation (LASS)
LASS aims to separate a target source from an audio mixture based on a natural language query of the target source.
We propose LASS-Net, an end-to-end neural network that is learned to jointly process acoustic and linguistic information.
arXiv Detail & Related papers (2022-03-28T23:47:57Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - On the Relation between Syntactic Divergence and Zero-Shot Performance [22.195133438732633]
We take the transfer of Universal Dependencies (UD) parsing from English to a diverse set of languages and conduct two sets of experiments.
We analyze zero-shot performance based on the extent to which English source edges are preserved in translation.
In both sets of experiments, our results suggest a strong relation between cross-lingual stability and zero-shot parsing performance.
arXiv Detail & Related papers (2021-10-09T21:09:21Z) - GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and
Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations.
GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree.
We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z) - Reference Language based Unsupervised Neural Machine Translation [108.64894168968067]
unsupervised neural machine translation almost completely relieves the parallel corpus curse.
We propose a new reference language-based framework for UNMT, RUNMT, in which the reference language only shares a parallel corpus with the source.
Experimental results show that our methods improve the quality of UNMT over that of a strong baseline that uses only one auxiliary language.
arXiv Detail & Related papers (2020-04-05T08:28:08Z) - Cross-Lingual Adaptation Using Universal Dependencies [1.027974860479791]
We show that models trained using UD parse trees for complex NLP tasks can characterize very different languages.
Based on UD parse trees, we develop several models using tree kernels and show that these models trained on the English dataset can correctly classify data of other languages.
arXiv Detail & Related papers (2020-03-24T13:04:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.