Event Guided Denoising for Multilingual Relation Learning
- URL: http://arxiv.org/abs/2012.02721v1
- Date: Fri, 4 Dec 2020 17:11:04 GMT
- Title: Event Guided Denoising for Multilingual Relation Learning
- Authors: Amith Ananthram, Emily Allaway, Kathleen McKeown
- Abstract summary: We present a methodology for collecting high quality training data for relation extraction from unlabeled text.
Our approach exploits the predictable distributional structure of date-marked news articles to build a denoised corpus.
We show that a smaller multilingual encoder trained on this corpus performs comparably to the current state-of-the-art.
- Score: 2.4192504570921627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: General purpose relation extraction has recently seen considerable gains in
part due to a massively data-intensive distant supervision technique from
Soares et al. (2019) that produces state-of-the-art results across many
benchmarks. In this work, we present a methodology for collecting high quality
training data for relation extraction from unlabeled text that achieves a
near-recreation of their zero-shot and few-shot results at a fraction of the
training cost. Our approach exploits the predictable distributional structure
of date-marked news articles to build a denoised corpus -- the extraction
process filters out low quality examples. We show that a smaller multilingual
encoder trained on this corpus performs comparably to the current
state-of-the-art (when both receive little to no fine-tuning) on few-shot and
standard relation benchmarks in English and Spanish despite using many fewer
examples (50k vs. 300mil+).
Related papers
- Evaluating and explaining training strategies for zero-shot cross-lingual news sentiment analysis [8.770572911942635]
We introduce novel evaluation datasets in several less-resourced languages.
We experiment with a range of approaches including the use of machine translation.
We show that language similarity is not in itself sufficient for predicting the success of cross-lingual transfer.
arXiv Detail & Related papers (2024-09-30T07:59:41Z) - Improving Text Embeddings with Large Language Models [59.930513259982725]
We introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps.
We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across 93 languages.
Experiments demonstrate that our method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data.
arXiv Detail & Related papers (2023-12-31T02:13:18Z) - Narrowing the Gap between Zero- and Few-shot Machine Translation by
Matching Styles [53.92189950211852]
Large language models have demonstrated their ability to generalize to machine translation using zero- and few-shot examples with in-context learning.
In this paper, we investigate the factors contributing to this gap and find that this gap can largely be closed (for about 70%) by matching the writing styles of the target corpus.
arXiv Detail & Related papers (2023-11-04T03:18:45Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations [97.41375480696972]
We introduce Z-ICL, a new zero-shot method that closes the gap by constructing pseudo-demonstrations for a given test input.
evaluation on nine classification datasets shows that Z-ICL outperforms previous zero-shot methods by a significant margin.
arXiv Detail & Related papers (2022-12-19T21:34:26Z) - Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning [25.230786853723203]
We propose a noise-robust cross-lingual cross-modal retrieval method for low-resource languages.
We use Machine Translation to construct pseudo-parallel sentence pairs for low-resource languages.
We introduce a multi-view self-distillation method to learn noise-robust target-language representations.
arXiv Detail & Related papers (2022-08-26T09:32:24Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Token-wise Curriculum Learning for Neural Machine Translation [94.93133801641707]
Existing curriculum learning approaches to Neural Machine Translation (NMT) require sufficient sampling amounts of "easy" samples from training data at the early training stage.
We propose a novel token-wise curriculum learning approach that creates sufficient amounts of easy samples.
Our approach can consistently outperform baselines on 5 language pairs, especially for low-resource languages.
arXiv Detail & Related papers (2021-03-20T03:57:59Z) - Cross-lingual Approach to Abstractive Summarization [0.0]
Cross-lingual model transfers are successfully applied in low-resource languages.
We used a pretrained English summarization model based on deep neural networks and sequence-to-sequence architecture.
We developed several models with different proportions of target language data for fine-tuning.
arXiv Detail & Related papers (2020-12-08T09:30:38Z) - WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive
Summarization [41.578594261746055]
We introduce WikiLingua, a large-scale, multilingual dataset for the evaluation of crosslingual abstractive summarization systems.
We extract article and summary pairs in 18 languages from WikiHow, a high quality, collaborative resource of how-to guides on a diverse set of topics written by human authors.
We create gold-standard article-summary alignments across languages by aligning the images that are used to describe each how-to step in an article.
arXiv Detail & Related papers (2020-10-07T00:28:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.