$C^3$: Confidence Calibration Model Cascade for Inference-Efficient
Cross-Lingual Natural Language Understanding
- URL: http://arxiv.org/abs/2402.15991v1
- Date: Sun, 25 Feb 2024 05:07:56 GMT
- Title: $C^3$: Confidence Calibration Model Cascade for Inference-Efficient
Cross-Lingual Natural Language Understanding
- Authors: Taixi Lu, Haoyu Wang, Huajie Shao, Jing Gao, Huaxiu Yao
- Abstract summary: Cross-lingual natural language understanding (NLU) is a critical task in natural language processing (NLP)
Recent advancements have seen multilingual pre-trained language models (mPLMs) significantly enhance the performance of these tasks.
Existing model cascade methods seek to enhance inference efficiency by greedily selecting the lightest model capable of processing the current input from a variety of models.
- Score: 28.853593305486832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-lingual natural language understanding (NLU) is a critical task in
natural language processing (NLP). Recent advancements have seen multilingual
pre-trained language models (mPLMs) significantly enhance the performance of
these tasks. However, mPLMs necessitate substantial resources and incur high
computational costs during inference, posing challenges for deployment in
real-world and real-time systems. Existing model cascade methods seek to
enhance inference efficiency by greedily selecting the lightest model capable
of processing the current input from a variety of models, based on model
confidence scores. Nonetheless, deep models tend to exhibit overconfidence, and
confidence distributions vary across languages. This leads to the emission of
confident but incorrect predictions by smaller models, hindering their ability
to generalize effectively across test languages. In this study, we introduce a
confidence calibration model cascade ($C^3$) method. This approach, simple yet
effective, involves calibration prior to cascade inference, thereby enhancing
cascade accuracy through more reliable predictions. Extensive experiments
conducted on three cross-lingual benchmarks demonstrate that $C^3$
significantly outperforms all state-of-the-art baselines.
Related papers
- Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 100 publically available models.
We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models.
We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z) - Calibrating the Confidence of Large Language Models by Eliciting Fidelity [52.47397325111864]
Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless.
Post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate.
We propose a plug-and-play method to estimate the confidence of language models.
arXiv Detail & Related papers (2024-04-03T11:36:12Z) - Headless Language Models: Learning without Predicting with Contrastive
Weight Tying [0.11510009152620666]
Self-supervised pre-training of language models usually consists in predicting probability distributions over extensive token vocabularies.
We propose an innovative method that shifts away from probability prediction and instead focuses on reconstructing input embeddings in a contrastive fashion via Constrastive Weight Tying (CWT)
We observe a significant +1.6 GLUE score increase and a notable +2.7 LAMBADA accuracy improvement compared to classical LMs within similar compute budgets.
arXiv Detail & Related papers (2023-09-15T12:20:00Z) - Cabrita: closing the gap for foreign languages [0.0]
The strategy of training the model from scratch in a specific language or domain serves two essential purposes.
Main solution to overcome the cost challenge is to rely on available pre-trained models.
We present a methodology named Cabrita, which successfully addresses the performance and efficient tokenization problem.
arXiv Detail & Related papers (2023-08-23T02:49:35Z) - Preserving Pre-trained Features Helps Calibrate Fine-tuned Language
Models [23.881825575095945]
Large pre-trained language models (PLMs) have demonstrated strong performance on natural language understanding (NLU) tasks through fine-tuning.
However, fine-tuned models still suffer from overconfident predictions, especially in out-of-domain settings.
We demonstrate that the PLMs are well-calibrated on the masked language modeling task with robust predictive confidence under domain shift.
We show that preserving pre-trained features can improve the calibration of fine-tuned language models.
arXiv Detail & Related papers (2023-05-30T17:35:31Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z) - BabyBear: Cheap inference triage for expensive language models [9.023847175654602]
We introduce BabyBear, a framework for cascading models for natural language processing (NLP) tasks.
We find that for common NLP tasks a high proportion of the inference load can be accomplished with cheap, fast models that have learned by observing a deep learning model.
This allows us to reduce the compute cost of large-scale classification jobs by more than 50% while retaining overall accuracy.
arXiv Detail & Related papers (2022-05-24T03:21:07Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.