Related papers: Mask-combine Decoding and Classification Approach for Punctuation Prediction with real-time Inference Constraints

Mask-combine Decoding and Classification Approach for Punctuation Prediction with real-time Inference Constraints

URL: http://arxiv.org/abs/2112.08098v2
Date: Fri, 17 Dec 2021 09:43:41 GMT
Title: Mask-combine Decoding and Classification Approach for Punctuation Prediction with real-time Inference Constraints
Authors: Christoph Minixhofer, Ond\v{r}ej Klejch, Peter Bell
Abstract summary: We unify several existing decoding strategies for punctuation prediction in one framework. We show that significant improvements can be achieved by optimising these strategies after training a model. We use our decoding strategy framework for the first comparison of tagging and classification approaches for punctuation prediction in a real-time setting.
Score: 10.75980867987981
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we unify several existing decoding strategies for punctuation prediction in one framework and introduce a novel strategy which utilises multiple predictions at each word across different windows. We show that significant improvements can be achieved by optimising these strategies after training a model, only leading to a potential increase in inference time, with no requirement for retraining. We further use our decoding strategy framework for the first comparison of tagging and classification approaches for punctuation prediction in a real-time setting. Our results show that a classification approach for punctuation prediction can be beneficial when little or no right-side context is available.

Related papers

Incremental Sequence Classification with Temporal Consistency [9.65650774513798]
We address the problem of incremental sequence classification, where predictions are updated as new elements in the sequence are revealed.<n>We leverage a temporal-consistency condition that successive predictions should satisfy to develop a novel loss function for training incremental sequence classifiers.<n>Our results show that models trained with our method are better able to distinguish promising generations from unpromising ones after observing only a few tokens.
arXiv Detail & Related papers (2025-05-22T11:37:53Z)
Bayesian Test-Time Adaptation for Vision-Language Models [51.93247610195295]
Test-time adaptation with pre-trained vision-language models, such as CLIP, aims to adapt the model to new, potentially out-of-distribution test data. We propose a novel approach, textbfBayesian textbfClass textbfAdaptation (BCA), which in addition to continuously updating class embeddings to adapt likelihood, also uses the posterior of incoming samples to continuously update the prior for each class embedding.
arXiv Detail & Related papers (2025-03-12T10:42:11Z)
Conformal Prediction Sets with Improved Conditional Coverage using Trust Scores [52.92618442300405]
It is impossible to achieve exact, distribution-free conditional coverage in finite samples. We propose an alternative conformal prediction algorithm that targets coverage where it matters most.
arXiv Detail & Related papers (2025-01-17T12:01:56Z)
Conformal Structured Prediction [32.23920437534215]
We propose a general framework for conformal prediction in the structured prediction setting. We show how our algorithm can be used to construct prediction sets that satisfy a desired coverage guarantee in several domains.
arXiv Detail & Related papers (2024-10-08T18:56:15Z)
Weighted Aggregation of Conformity Scores for Classification [9.559062601251464]
Conformal prediction is a powerful framework for constructing prediction sets with valid coverage guarantees. We propose a novel approach that combines multiple score functions to improve the performance of conformal predictors.
arXiv Detail & Related papers (2024-07-14T14:58:03Z)
Loss Shaping Constraints for Long-Term Time Series Forecasting [79.3533114027664]
We present a Constrained Learning approach for long-term time series forecasting that respects a user-defined upper bound on the loss at each time-step. We propose a practical Primal-Dual algorithm to tackle it, and aims to demonstrate that it exhibits competitive average performance in time series benchmarks, while shaping the errors across the predicted window.
arXiv Detail & Related papers (2024-02-14T18:20:44Z)
When Does Confidence-Based Cascade Deferral Suffice? [69.28314307469381]
Cascades are a classical strategy to enable inference cost to vary adaptively across samples. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. Despite being oblivious to the structure of the cascade, confidence-based deferral often works remarkably well in practice.
arXiv Detail & Related papers (2023-07-06T04:13:57Z)
Efficient and Differentiable Conformal Prediction with General Function Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters. We show that it achieves approximate valid population coverage and near-optimal efficiency within class. Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z)
Learning Predictions for Algorithms with Predictions [49.341241064279714]
We introduce a general design approach for algorithms that learn predictors. We apply techniques from online learning to learn against adversarial instances, tune robustness-consistency trade-offs, and obtain new statistical guarantees. We demonstrate the effectiveness of our approach at deriving learning algorithms by analyzing methods for bipartite matching, page migration, ski-rental, and job scheduling.
arXiv Detail & Related papers (2022-02-18T17:25:43Z)
Few-shot Conformal Prediction with Auxiliary Tasks [29.034390810078172]
We develop a novel approach to conformal prediction when the target task has limited data available for training. We obtain substantially tighter prediction sets while maintaining desirable marginal guarantees by casting conformal prediction as a meta-learning paradigm. We demonstrate the effectiveness of this approach across a number of few-shot classification and regression tasks in natural language processing, computer vision, and computational chemistry for drug discovery.
arXiv Detail & Related papers (2021-02-17T17:46:57Z)
$k$-Neighbor Based Curriculum Sampling for Sequence Prediction [22.631763991832862]
Multi-step ahead prediction in language models is challenging due to discrepancy between training and test time processes. We propose textitNearest-Neighbor Replacement Sampling -- a curriculum learning-based method that gradually changes an initially deterministic teacher policy. We report our findings on two language modelling benchmarks and find that the proposed method further improves performance when used in conjunction with scheduled sampling.
arXiv Detail & Related papers (2021-01-22T20:07:29Z)
Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses [84.2964408497058]
Point-level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance. Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame labels. This paper attempts to explore the proposal-based prediction paradigm for point-level annotations.
arXiv Detail & Related papers (2020-12-15T12:11:48Z)
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading [96.48553941812366]
Lip-reading aims to infer the speech content from the lip movement sequence. Traditional learning process of seq2seq models suffers from two problems. We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
arXiv Detail & Related papers (2020-03-09T09:12:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.