Comparison of pipeline, sequence-to-sequence, and GPT models for
end-to-end relation extraction: experiments with the rare disease use-case
- URL: http://arxiv.org/abs/2311.13729v2
- Date: Sun, 10 Mar 2024 01:11:20 GMT
- Title: Comparison of pipeline, sequence-to-sequence, and GPT models for
end-to-end relation extraction: experiments with the rare disease use-case
- Authors: Shashank Gupta, Xuguang Ai, Ramakanth Kavuluru
- Abstract summary: End-to-end relation extraction (E2ERE) is an important and realistic application of natural language processing (NLP) in biomedicine.
We compare three prevailing paradigms for E2ERE using a complex dataset focused on rare diseases.
We find that pipeline models are still the best, while sequence-to-sequence models are not far behind.
- Score: 2.9013777655907056
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: End-to-end relation extraction (E2ERE) is an important and realistic
application of natural language processing (NLP) in biomedicine. In this paper,
we aim to compare three prevailing paradigms for E2ERE using a complex dataset
focused on rare diseases involving discontinuous and nested entities. We use
the RareDis information extraction dataset to evaluate three competing
approaches (for E2ERE): NER $\rightarrow$ RE pipelines, joint sequence to
sequence models, and generative pre-trained transformer (GPT) models. We use
comparable state-of-the-art models and best practices for each of these
approaches and conduct error analyses to assess their failure modes. Our
findings reveal that pipeline models are still the best, while
sequence-to-sequence models are not far behind; GPT models with eight times as
many parameters are worse than even sequence-to-sequence models and lose to
pipeline models by over 10 F1 points. Partial matches and discontinuous
entities caused many NER errors contributing to lower overall E2E performances.
We also verify these findings on a second E2ERE dataset for chemical-protein
interactions. Although generative LM-based methods are more suitable for
zero-shot settings, when training data is available, our results show that it
is better to work with more conventional models trained and tailored for E2ERE.
More innovative methods are needed to marry the best of the both worlds from
smaller encoder-decoder pipeline models and the larger GPT models to improve
E2ERE. As of now, we see that well designed pipeline models offer substantial
performance gains at a lower cost and carbon footprint for E2ERE. Our
contribution is also the first to conduct E2ERE for the RareDis dataset.
Related papers
- Phikon-v2, A large and public feature extractor for biomarker prediction [42.52549987351643]
We train a vision transformer using DINOv2 and publicly release one iteration of this model for further experimentation, coined Phikon-v2.
While trained on publicly available histology slides, Phikon-v2 surpasses our previously released model (Phikon) and performs on par with other histopathology foundation models (FM) trained on proprietary data.
arXiv Detail & Related papers (2024-09-13T20:12:29Z) - Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences [20.629333587044012]
We study the impact of data curation on iterated retraining of generative models.
We prove that, if the data is curated according to a reward model, the expected reward of the iterative retraining procedure is maximized.
arXiv Detail & Related papers (2024-06-12T21:28:28Z) - Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models [115.501751261878]
Fine-tuning language models(LMs) on human-generated data remains a prevalent practice.
We investigate whether we can go beyond human data on tasks where we have access to scalar feedback.
We find that ReST$EM$ scales favorably with model size and significantly surpasses fine-tuning only on human data.
arXiv Detail & Related papers (2023-12-11T18:17:43Z) - Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces.
We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - DORE: Document Ordered Relation Extraction based on Generative Framework [56.537386636819626]
This paper investigates the root cause of the underwhelming performance of the existing generative DocRE models.
We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn.
Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models.
arXiv Detail & Related papers (2022-10-28T11:18:10Z) - A Comparison of LSTM and BERT for Small Corpus [0.0]
Recent advancements in the NLP field showed that transfer learning helps with achieving state-of-the-art results for new tasks by tuning pre-trained models instead of starting from scratch.
In this paper we focus on a real-life scenario that scientists in academia and industry face frequently: given a small dataset, can we use a large pre-trained model like BERT and get better results than simple models?
Our experimental results show that bidirectional LSTM models can achieve significantly higher results than a BERT model for a small dataset and these simple models get trained in much less time than tuning the pre-trained counterparts.
arXiv Detail & Related papers (2020-09-11T14:01:14Z) - Missing Features Reconstruction Using a Wasserstein Generative
Adversarial Imputation Network [0.0]
We experimentally research the use of generative and non-generative models for feature reconstruction.
Generative Autoencoder with Arbitrary Conditioning (VAEAC) and Generative Adversarial Imputation Network (GAIN) were researched as representatives of generative models.
We introduce WGAIN as the Wasserstein modification of GAIN, which turns out to be the best imputation model when the degree of missingness is less than or equal to 30%.
arXiv Detail & Related papers (2020-06-21T11:53:55Z) - Recent Developments Combining Ensemble Smoother and Deep Generative
Networks for Facies History Matching [58.720142291102135]
This research project focuses on the use of autoencoders networks to construct a continuous parameterization for facies models.
We benchmark seven different formulations, including VAE, generative adversarial network (GAN), Wasserstein GAN, variational auto-encoding GAN, principal component analysis (PCA) with cycle GAN, PCA with transfer style network and VAE with style loss.
arXiv Detail & Related papers (2020-05-08T21:32:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.