WR-ONE2SET: Towards Well-Calibrated Keyphrase Generation
- URL: http://arxiv.org/abs/2211.06862v1
- Date: Sun, 13 Nov 2022 09:56:24 GMT
- Title: WR-ONE2SET: Towards Well-Calibrated Keyphrase Generation
- Authors: Binbin Xie, Xiangpeng Wei, Baosong Yang, Huan Lin, Jun Xie, Xiaoli
Wang, Min Zhang and Jinsong Su
- Abstract summary: Keyphrase generation aims to automatically generate short phrases summarizing an input document.
The recently emerged ONE2SET paradigm generates keyphrases as a set and has achieved competitive performance.
We propose WR-ONE2SET which extends ONE2SET with an adaptive instance-level cost Weighting strategy and a target Re-assignment mechanism.
- Score: 57.11538133231843
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Keyphrase generation aims to automatically generate short phrases summarizing
an input document. The recently emerged ONE2SET paradigm (Ye et al., 2021)
generates keyphrases as a set and has achieved competitive performance.
Nevertheless, we observe serious calibration errors outputted by ONE2SET,
especially in the over-estimation of $\varnothing$ token (means "no
corresponding keyphrase"). In this paper, we deeply analyze this limitation and
identify two main reasons behind: 1) the parallel generation has to introduce
excessive $\varnothing$ as padding tokens into training instances; and 2) the
training mechanism assigning target to each slot is unstable and further
aggravates the $\varnothing$ token over-estimation. To make the model
well-calibrated, we propose WR-ONE2SET which extends ONE2SET with an adaptive
instance-level cost Weighting strategy and a target Re-assignment mechanism.
The former dynamically penalizes the over-estimated slots for different
instances thus smoothing the uneven training distribution. The latter refines
the original inappropriate assignment and reduces the supervisory signals of
over-estimated slots. Experimental results on commonly-used datasets
demonstrate the effectiveness and generality of our proposed paradigm.
Related papers
- One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Revisiting Discriminative vs. Generative Classifiers: Theory and
Implications [37.98169487351508]
This paper is inspired by the statistical efficiency of naive Bayes.
We present a multiclass $mathcalH$-consistency bound framework and an explicit bound for logistic loss.
Experiments on various pre-trained deep vision models show that naive Bayes consistently converges faster as the number of data increases.
arXiv Detail & Related papers (2023-02-05T08:30:42Z) - Characterizing Datapoints via Second-Split Forgetting [93.99363547536392]
We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten.
We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly.
SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
arXiv Detail & Related papers (2022-10-26T21:03:46Z) - Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully
Exploiting Self-Attention [36.90363317158731]
We propose an adaptive sparse token pruning framework with a minimal cost.
Our method improves the throughput of DeiT-S by 50% and brings only 0.2% drop in top-1 accuracy.
arXiv Detail & Related papers (2022-09-28T03:07:32Z) - Detection and Mitigation of Byzantine Attacks in Distributed Training [24.951227624475443]
An abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference.
Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients.
In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities.
arXiv Detail & Related papers (2022-08-17T05:49:52Z) - Sparsity Winning Twice: Better Robust Generalization from More Efficient
Training [94.92954973680914]
We introduce two alternatives for sparse adversarial training: (i) static sparsity and (ii) dynamic sparsity.
We find both methods to yield win-win: substantially shrinking the robust generalization gap and alleviating the robust overfitting.
Our approaches can be combined with existing regularizers, establishing new state-of-the-art results in adversarial training.
arXiv Detail & Related papers (2022-02-20T15:52:08Z) - On the robustness of randomized classifiers to adversarial examples [11.359085303200981]
We introduce a new notion of robustness for randomized classifiers, enforcing local Lipschitzness using probability metrics.
We show that our results are applicable to a wide range of machine learning models under mild hypotheses.
All robust models we trained models can simultaneously achieve state-of-the-art accuracy.
arXiv Detail & Related papers (2021-02-22T10:16:58Z) - Almost Tight L0-norm Certified Robustness of Top-k Predictions against
Adversarial Perturbations [78.23408201652984]
Top-k predictions are used in many real-world applications such as machine learning as a service, recommender systems, and web searches.
Our work is based on randomized smoothing, which builds a provably robust classifier via randomizing an input.
For instance, our method can build a classifier that achieves a certified top-3 accuracy of 69.2% on ImageNet when an attacker can arbitrarily perturb 5 pixels of a testing image.
arXiv Detail & Related papers (2020-11-15T21:34:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.