A Conditional Cascade Model for Relational Triple Extraction
- URL: http://arxiv.org/abs/2108.13303v1
- Date: Fri, 20 Aug 2021 03:00:59 GMT
- Title: A Conditional Cascade Model for Relational Triple Extraction
- Authors: Feiliang Ren, Longhui Zhang, Shujuan Yin, Xiaofeng Zhao, Shilei Liu,
Bochao Li
- Abstract summary: Tagging based methods are one of the mainstream methods in triple extraction.
Most of them suffer from the class imbalance issue greatly.
We propose a novel tagging based model that addresses this issue.
- Score: 0.9926500244448218
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Tagging based methods are one of the mainstream methods in relational triple
extraction. However, most of them suffer from the class imbalance issue
greatly. Here we propose a novel tagging based model that addresses this issue
from following two aspects. First, at the model level, we propose a three-step
extraction framework that can reduce the total number of samples greatly, which
implicitly decreases the severity of the mentioned issue. Second, at the
intra-model level, we propose a confidence threshold based cross entropy loss
that can directly neglect some samples in the major classes. We evaluate the
proposed model on NYT and WebNLG. Extensive experiments show that it can
address the mentioned issue effectively and achieves state-of-the-art results
on both datasets. The source code of our model is available at:
https://github.com/neukg/ConCasRTE.
Related papers
- Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.
We empirically find that this training paradigm limits the one-step generation performance of consistency models.
We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z) - Score Mismatching for Generative Modeling [4.413162309652114]
We propose a new score-based model with one-step sampling.
We train a standalone generator to compress all the time steps with the gradient backpropagated from the score network.
In order to produce meaningful gradients for the generator, the score network is trained to simultaneously match the real data distribution and mismatch the fake data distribution.
arXiv Detail & Related papers (2023-09-20T03:47:12Z) - Distilling Reasoning Capabilities into Smaller Language Models [83.66051257039763]
Step-by-step reasoning approaches like chain of thought (CoT) have proved to be very effective in inducing reasoning capabilities in large language models.
However, the success of the CoT approach is fundamentally tied to the model size, and billion parameter-scale models are often needed to get CoT to work.
We propose a knowledge distillation approach that leverages the step-by-step CoT reasoning capabilities of larger models and distills these abilities into smaller models.
arXiv Detail & Related papers (2022-12-01T00:39:56Z) - OneRel:Joint Entity and Relation Extraction with One Module in One Step [42.576188878294886]
Joint entity and relation extraction is an essential task in natural language processing and knowledge graph construction.
We propose a novel joint entity and relation extraction model, named OneRel, which casts joint extraction as a fine-grained triple classification problem.
arXiv Detail & Related papers (2022-03-10T15:09:59Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Contextual Dropout: An Efficient Sample-Dependent Dropout Module [60.63525456640462]
Dropout has been demonstrated as a simple and effective module to regularize the training process of deep neural networks.
We propose contextual dropout with an efficient structural design as a simple and scalable sample-dependent dropout module.
Our experimental results show that the proposed method outperforms baseline methods in terms of both accuracy and quality of uncertainty estimation.
arXiv Detail & Related papers (2021-03-06T19:30:32Z) - G2MF-WA: Geometric Multi-Model Fitting with Weakly Annotated Data [15.499276649167975]
In weak annotating, most of the manual annotations are supposed to be correct yet inevitably mixed with incorrect ones.
We propose a novel method to make full use of the WA data to boost the multi-model fitting performance.
arXiv Detail & Related papers (2020-01-20T04:22:01Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z) - Listwise Learning to Rank by Exploring Unique Ratings [32.857847595096025]
Existing listwise learning-to-rank models are generally derived from the classical Plackett-Luce model, which has three major limitations.
We propose a novel and efficient way of refining prediction scores by combining an adapted Vanilla Recurrent Neural Network (RNN) model with pooling given documents at previous steps.
Experiments demonstrate that the models notably outperform state-of-the-art learning-to-rank models.
arXiv Detail & Related papers (2020-01-07T00:50:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.