Assessing Neural Referential Form Selectors on a Realistic Multilingual
Dataset
- URL: http://arxiv.org/abs/2210.04828v2
- Date: Tue, 11 Oct 2022 18:44:37 GMT
- Title: Assessing Neural Referential Form Selectors on a Realistic Multilingual
Dataset
- Authors: Guanyi Chen, Fahime Same, Kees van Deemter
- Abstract summary: We build a dataset based on the OntoNotes corpus that contains a broader range of referring expression (RE) use in both English and Chinese.
We build neural Referential Form Selection (RFS) models accordingly, assess them on the dataset and conduct probing experiments.
- Score: 6.651864489482537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Previous work on Neural Referring Expression Generation (REG) all uses
WebNLG, an English dataset that has been shown to reflect a very limited range
of referring expression (RE) use. To tackle this issue, we build a dataset
based on the OntoNotes corpus that contains a broader range of RE use in both
English and Chinese (a language that uses zero pronouns). We build neural
Referential Form Selection (RFS) models accordingly, assess them on the dataset
and conduct probing experiments. The experiments suggest that, compared to
WebNLG, OntoNotes is better for assessing REG/RFS models. We compare English
and Chinese RFS and confirm that, in line with linguistic theories, Chinese RFS
depends more on discourse context than English.
Related papers
- Multilingual Diversity Improves Vision-Language Representations [66.41030381363244]
Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet.
On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa.
arXiv Detail & Related papers (2024-05-27T08:08:51Z) - HistRED: A Historical Document-Level Relation Extraction Dataset [32.96963890713529]
HistRED is constructed from Yeonhaengnok, a collection of records originally written in Hanja, the classical Chinese writing.
HistRED provides bilingual annotations such that RE can be performed on Korean and Hanja texts.
We propose a bilingual RE model that leverages both Korean and Hanja contexts to predict relations between entities.
arXiv Detail & Related papers (2023-07-10T00:24:27Z) - XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for
Cross-lingual Text-to-SQL Semantic Parsing [70.40401197026925]
In-context learning using large language models has recently shown surprising results for semantic parsing tasks.
This work introduces the XRICL framework, which learns to retrieve relevant English exemplars for a given query.
We also include global translation exemplars for a target language to facilitate the translation process for large language models.
arXiv Detail & Related papers (2022-10-25T01:33:49Z) - Improving Retrieval Augmented Neural Machine Translation by Controlling
Source and Fuzzy-Match Interactions [15.845071122977158]
We build on the idea of Retrieval Augmented Translation (RAT) where top-k in-domain fuzzy matches are found for the source sentence.
We propose a novel architecture to control interactions between a source sentence and the top-k fuzzy target-language matches.
arXiv Detail & Related papers (2022-10-10T23:33:15Z) - DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation
Extraction [15.649929244635269]
We propose a new dataset, DiS-ReX, which alleviates these issues.
Our dataset has more than 1.5 million sentences, spanning across 4 languages with 36 relation classes + 1 no relation (NA) class.
We also modify the widely used bag attention models by encoding sentences using mBERT and provide the first benchmark results on multilingual DS-RE.
arXiv Detail & Related papers (2021-04-17T22:44:38Z) - GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and
Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations.
GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree.
We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z) - Learning from Context or Names? An Empirical Study on Neural Relation
Extraction [112.06614505580501]
We study the effect of two main information sources in text: textual context and entity mentions (names)
We propose an entity-masked contrastive pre-training framework for relation extraction (RE)
Our framework can improve the effectiveness and robustness of neural models in different RE scenarios.
arXiv Detail & Related papers (2020-10-05T11:21:59Z) - NABU $\mathrm{-}$ Multilingual Graph-based Neural RDF Verbalizer [3.419992814908564]
NABU is a graph-based neural model that verbalizes RDF data to German, Russian, and English.
Our results show that NABU outperforms state-of-the-art approaches on English with 66.21 BLEU.
arXiv Detail & Related papers (2020-09-16T14:59:06Z) - TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [113.29476656550342]
We present TaBERT, a pretrained LM that jointly learns representations for NL sentences and tables.
TaBERT is trained on a large corpus of 26 million tables and their English contexts.
Implementation of the model will be available at http://fburl.com/TaBERT.
arXiv Detail & Related papers (2020-05-17T17:26:40Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.