Bootstrapping Relation Extractors using Syntactic Search by Examples
- URL: http://arxiv.org/abs/2102.05007v1
- Date: Tue, 9 Feb 2021 18:17:59 GMT
- Title: Bootstrapping Relation Extractors using Syntactic Search by Examples
- Authors: Matan Eyal, Asaf Amrami, Hillel Taub-Tabib, Yoav Goldberg
- Abstract summary: We propose a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts.
We take advantage of search engines over syntactic-graphs which expose a friendly by-example syntax.
We show that the resulting models are competitive with models trained on manually annotated data and on data obtained from distant supervision.
- Score: 47.11932446745022
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advent of neural-networks in NLP brought with it substantial improvements
in supervised relation extraction. However, obtaining a sufficient quantity of
training data remains a key challenge. In this work we propose a process for
bootstrapping training datasets which can be performed quickly by
non-NLP-experts. We take advantage of search engines over syntactic-graphs
(Such as Shlain et al. (2020)) which expose a friendly by-example syntax. We
use these to obtain positive examples by searching for sentences that are
syntactically similar to user input examples. We apply this technique to
relations from TACRED and DocRED and show that the resulting models are
competitive with models trained on manually annotated data and on data obtained
from distant supervision. The models also outperform models trained using NLG
data augmentation techniques. Extending the search-based approach with the NLG
method further improves the results.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Robust Task-Oriented Dialogue Generation with Contrastive Pre-training
and Adversarial Filtering [17.7709632238066]
Data artifacts incentivize machine learning models to learn non-transferable generalizations.
We investigate whether popular datasets such as MultiWOZ contain such data artifacts.
We propose a contrastive learning based framework to encourage the model to ignore these cues and focus on learning generalisable patterns.
arXiv Detail & Related papers (2022-05-20T03:13:02Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - S^3-Rec: Self-Supervised Learning for Sequential Recommendation with
Mutual Information Maximization [104.87483578308526]
We propose the model S3-Rec, which stands for Self-Supervised learning for Sequential Recommendation.
For our task, we devise four auxiliary self-supervised objectives to learn the correlations among attribute, item, subsequence, and sequence.
Extensive experiments conducted on six real-world datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-18T11:44:10Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Abstractive Summarization for Low Resource Data using Domain Transfer
and Data Synthesis [1.148539813252112]
We explore using domain transfer and data synthesis to improve the performance of recent abstractive summarization methods.
We show that tuning state of the art model trained on newspaper data could boost performance on student reflection data.
We propose a template-based model to synthesize new data, which when incorporated into training further increased ROUGE scores.
arXiv Detail & Related papers (2020-02-09T17:49:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.