Mask-combine Decoding and Classification Approach for Punctuation
Prediction with real-time Inference Constraints
- URL: http://arxiv.org/abs/2112.08098v2
- Date: Fri, 17 Dec 2021 09:43:41 GMT
- Title: Mask-combine Decoding and Classification Approach for Punctuation
Prediction with real-time Inference Constraints
- Authors: Christoph Minixhofer, Ond\v{r}ej Klejch, Peter Bell
- Abstract summary: We unify several existing decoding strategies for punctuation prediction in one framework.
We show that significant improvements can be achieved by optimising these strategies after training a model.
We use our decoding strategy framework for the first comparison of tagging and classification approaches for punctuation prediction in a real-time setting.
- Score: 10.75980867987981
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we unify several existing decoding strategies for punctuation
prediction in one framework and introduce a novel strategy which utilises
multiple predictions at each word across different windows. We show that
significant improvements can be achieved by optimising these strategies after
training a model, only leading to a potential increase in inference time, with
no requirement for retraining. We further use our decoding strategy framework
for the first comparison of tagging and classification approaches for
punctuation prediction in a real-time setting. Our results show that a
classification approach for punctuation prediction can be beneficial when
little or no right-side context is available.
Related papers
- Weighted Aggregation of Conformity Scores for Classification [9.559062601251464]
Conformal prediction is a powerful framework for constructing prediction sets with valid coverage guarantees.
We propose a novel approach that combines multiple score functions to improve the performance of conformal predictors.
arXiv Detail & Related papers (2024-07-14T14:58:03Z) - Loss Shaping Constraints for Long-Term Time Series Forecasting [79.3533114027664]
We present a Constrained Learning approach for long-term time series forecasting that respects a user-defined upper bound on the loss at each time-step.
We propose a practical Primal-Dual algorithm to tackle it, and aims to demonstrate that it exhibits competitive average performance in time series benchmarks, while shaping the errors across the predicted window.
arXiv Detail & Related papers (2024-02-14T18:20:44Z) - When Does Confidence-Based Cascade Deferral Suffice? [69.28314307469381]
Cascades are a classical strategy to enable inference cost to vary adaptively across samples.
A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction.
Despite being oblivious to the structure of the cascade, confidence-based deferral often works remarkably well in practice.
arXiv Detail & Related papers (2023-07-06T04:13:57Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Learning Predictions for Algorithms with Predictions [49.341241064279714]
We introduce a general design approach for algorithms that learn predictors.
We apply techniques from online learning to learn against adversarial instances, tune robustness-consistency trade-offs, and obtain new statistical guarantees.
We demonstrate the effectiveness of our approach at deriving learning algorithms by analyzing methods for bipartite matching, page migration, ski-rental, and job scheduling.
arXiv Detail & Related papers (2022-02-18T17:25:43Z) - Few-shot Conformal Prediction with Auxiliary Tasks [29.034390810078172]
We develop a novel approach to conformal prediction when the target task has limited data available for training.
We obtain substantially tighter prediction sets while maintaining desirable marginal guarantees by casting conformal prediction as a meta-learning paradigm.
We demonstrate the effectiveness of this approach across a number of few-shot classification and regression tasks in natural language processing, computer vision, and computational chemistry for drug discovery.
arXiv Detail & Related papers (2021-02-17T17:46:57Z) - $k$-Neighbor Based Curriculum Sampling for Sequence Prediction [22.631763991832862]
Multi-step ahead prediction in language models is challenging due to discrepancy between training and test time processes.
We propose textitNearest-Neighbor Replacement Sampling -- a curriculum learning-based method that gradually changes an initially deterministic teacher policy.
We report our findings on two language modelling benchmarks and find that the proposed method further improves performance when used in conjunction with scheduled sampling.
arXiv Detail & Related papers (2021-01-22T20:07:29Z) - Point-Level Temporal Action Localization: Bridging Fully-supervised
Proposals to Weakly-supervised Losses [84.2964408497058]
Point-level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.
Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame labels.
This paper attempts to explore the proposal-based prediction paradigm for point-level annotations.
arXiv Detail & Related papers (2020-12-15T12:11:48Z) - Model selection in reconciling hierarchical time series [2.705025060422369]
We propose an approach for dynamically selecting the most appropriate hierarchical forecasting method.
The approach is based on Machine Learning classification methods and uses time series features as leading indicators.
Our results suggest that conditional hierarchical forecasting leads to significantly more accurate forecasts than standard approaches.
arXiv Detail & Related papers (2020-10-21T03:40:35Z) - Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence
Lip-Reading [96.48553941812366]
Lip-reading aims to infer the speech content from the lip movement sequence.
Traditional learning process of seq2seq models suffers from two problems.
We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
arXiv Detail & Related papers (2020-03-09T09:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.