Ensemble Distillation Approaches for Grammatical Error Correction
- URL: http://arxiv.org/abs/2012.07535v2
- Date: Tue, 15 Dec 2020 09:45:47 GMT
- Title: Ensemble Distillation Approaches for Grammatical Error Correction
- Authors: Yassir Fathullah, Mark Gales, Andrey Malinin
- Abstract summary: Ensemble distillation (EnD) and ensemble distribution distillation (EnDD) have been proposed that compress the ensemble into a single model.
This paper examines the application of both these distillation approaches to a sequence prediction task, grammatical error correction (GEC)
It is, however, more challenging than the standard tasks investigated for distillation as the prediction of any grammatical correction to a word will be highly dependent on both the input sequence and the generated output history for the word.
- Score: 18.81579562876076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensemble approaches are commonly used techniques to improving a system by
combining multiple model predictions. Additionally these schemes allow the
uncertainty, as well as the source of the uncertainty, to be derived for the
prediction. Unfortunately these benefits come at a computational and memory
cost. To address this problem ensemble distillation (EnD) and more recently
ensemble distribution distillation (EnDD) have been proposed that compress the
ensemble into a single model, representing either the ensemble average
prediction or prediction distribution respectively. This paper examines the
application of both these distillation approaches to a sequence prediction
task, grammatical error correction (GEC). This is an important application area
for language learning tasks as it can yield highly useful feedback to the
learner. It is, however, more challenging than the standard tasks investigated
for distillation as the prediction of any grammatical correction to a word will
be highly dependent on both the input sequence and the generated output history
for the word. The performance of both EnD and EnDD are evaluated on both
publicly available GEC tasks as well as a spoken language task.
Related papers
- Ensemble Predicate Decoding for Unbiased Scene Graph Generation [40.01591739856469]
Scene Graph Generation (SGG) aims to generate a comprehensive graphical representation that captures semantic information of a given scenario.
The model's performance in predicting more fine-grained predicates is hindered by a significant predicate bias.
This paper proposes Ensemble Predicate Decoding (EPD), which employs multiple decoders to attain unbiased scene graph generation.
arXiv Detail & Related papers (2024-08-26T11:24:13Z) - Grammatical Error Correction via Mixed-Grained Weighted Training [68.94921674855621]
Grammatical Error Correction (GEC) aims to automatically correct grammatical errors in natural texts.
MainGEC designs token-level and sentence-level training weights based on inherent discrepancies in accuracy and potential diversity of data annotation.
arXiv Detail & Related papers (2023-11-23T08:34:37Z) - Functional Ensemble Distillation [18.34081591772928]
We investigate how to best distill an ensemble's predictions using an efficient model.
We find that learning the distilled model via a simple augmentation scheme in the form of mixup augmentation significantly boosts the performance.
arXiv Detail & Related papers (2022-06-05T14:07:17Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - A Syntax-Guided Grammatical Error Correction Model with Dependency Tree
Correction [83.14159143179269]
Grammatical Error Correction (GEC) is a task of detecting and correcting grammatical errors in sentences.
We propose a syntax-guided GEC model (SG-GEC) which adopts the graph attention mechanism to utilize the syntactic knowledge of dependency trees.
We evaluate our model on public benchmarks of GEC task and it achieves competitive results.
arXiv Detail & Related papers (2021-11-05T07:07:48Z) - Explain and Predict, and then Predict Again [6.865156063241553]
We propose ExPred, that uses multi-task learning in the explanation generation phase effectively trading-off explanation and prediction losses.
We conduct an extensive evaluation of our approach on three diverse language datasets.
arXiv Detail & Related papers (2021-01-11T19:36:52Z) - The Extraordinary Failure of Complement Coercion Crowdsourcing [50.599433903377374]
Crowdsourcing has eased and scaled up the collection of linguistic annotation in recent years.
We aim to collect annotated data for this phenomenon by reducing it to either of two known tasks: Explicit Completion and Natural Language Inference.
In both cases, crowdsourcing resulted in low agreement scores, even though we followed the same methodologies as in previous work.
arXiv Detail & Related papers (2020-10-12T19:04:04Z) - Improving the Efficiency of Grammatical Error Correction with Erroneous
Span Detection and Correction [106.63733511672721]
We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection ( ESD) and Erroneous Span Correction (ESC)
ESD identifies grammatically incorrect text spans with an efficient sequence tagging model. ESC leverages a seq2seq model to take the sentence with annotated erroneous spans as input and only outputs the corrected text for these spans.
Experiments show our approach performs comparably to conventional seq2seq approaches in both English and Chinese GEC benchmarks with less than 50% time cost for inference.
arXiv Detail & Related papers (2020-10-07T08:29:11Z) - Efficient Conformal Prediction via Cascaded Inference with Expanded
Admission [43.596058175459746]
We present a novel approach for conformal prediction (CP)
We aim to identify a set of promising prediction candidates -- in place of a single prediction.
This set is guaranteed to contain a correct answer with high probability.
arXiv Detail & Related papers (2020-07-06T23:13:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.