Related papers: Bootstrapping Relation Extractors using Syntactic Search by Examples

Bootstrapping Relation Extractors using Syntactic Search by Examples

URL: http://arxiv.org/abs/2102.05007v1
Date: Tue, 9 Feb 2021 18:17:59 GMT
Title: Bootstrapping Relation Extractors using Syntactic Search by Examples
Authors: Matan Eyal, Asaf Amrami, Hillel Taub-Tabib, Yoav Goldberg
Abstract summary: We propose a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts. We take advantage of search engines over syntactic-graphs which expose a friendly by-example syntax. We show that the resulting models are competitive with models trained on manually annotated data and on data obtained from distant supervision.
Score: 47.11932446745022
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The advent of neural-networks in NLP brought with it substantial improvements in supervised relation extraction. However, obtaining a sufficient quantity of training data remains a key challenge. In this work we propose a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts. We take advantage of search engines over syntactic-graphs (Such as Shlain et al. (2020)) which expose a friendly by-example syntax. We use these to obtain positive examples by searching for sentences that are syntactically similar to user input examples. We apply this technique to relations from TACRED and DocRED and show that the resulting models are competitive with models trained on manually annotated data and on data obtained from distant supervision. The models also outperform models trained using NLG data augmentation techniques. Extending the search-based approach with the NLG method further improves the results.

Related papers

Conventional Contrastive Learning Often Falls Short: Improving Dense Retrieval with Cross-Encoder Listwise Distillation and Synthetic Data [43.81779293196647]
We investigate improving the retrieval effectiveness of embedding models through the lens of corpus-specific fine-tuning.<n>We find that fine-tuning using the conventional InfoNCE contrastive loss often reduces effectiveness in state-of-the-art models.<n>We use our approach to train an embedding model that achieves state-of-the-art effectiveness among BERT embedding models.
arXiv Detail & Related papers (2025-05-25T19:06:19Z)
Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation [43.81779293196647]
We show that standard fine-tuning methods can unexpectedly degrade effectiveness rather than improve it, even for domain-specific scenarios. We explore a training strategy that uses listwise distillation from a teacher cross-encoder, leveraging rich relevance signals to fine-tune the retriever. Our results also reveal that synthetic queries can rival human-written queries in training utility.
arXiv Detail & Related papers (2025-02-27T03:07:49Z)
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data. We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z)
Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries. Experimental results show that our method improves consistently over existing methods. Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z)
Robust Task-Oriented Dialogue Generation with Contrastive Pre-training and Adversarial Filtering [17.7709632238066]
Data artifacts incentivize machine learning models to learn non-transferable generalizations. We investigate whether popular datasets such as MultiWOZ contain such data artifacts. We propose a contrastive learning based framework to encourage the model to ignore these cues and focus on learning generalisable patterns.
arXiv Detail & Related papers (2022-05-20T03:13:02Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
S^3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization [104.87483578308526]
We propose the model S3-Rec, which stands for Self-Supervised learning for Sequential Recommendation. For our task, we devise four auxiliary self-supervised objectives to learn the correlations among attribute, item, subsequence, and sequence. Extensive experiments conducted on six real-world datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-18T11:44:10Z)
Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations. Our framework well preserves the relations between samples. By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
Abstractive Summarization for Low Resource Data using Domain Transfer and Data Synthesis [1.148539813252112]
We explore using domain transfer and data synthesis to improve the performance of recent abstractive summarization methods. We show that tuning state of the art model trained on newspaper data could boost performance on student reflection data. We propose a template-based model to synthesize new data, which when incorporated into training further increased ROUGE scores.
arXiv Detail & Related papers (2020-02-09T17:49:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.