Green CWS: Extreme Distillation and Efficient Decode Method Towards
Industrial Application
- URL: http://arxiv.org/abs/2111.09078v1
- Date: Wed, 17 Nov 2021 12:45:02 GMT
- Title: Green CWS: Extreme Distillation and Efficient Decode Method Towards
Industrial Application
- Authors: Yulan Hu, Yong Liu
- Abstract summary: This work proposes a fast and accurate CWS framework that incorporates a light-weighted model and an upgraded decode method (PCRF)
Experiments show that our work obtains relatively high performance on multiple datasets with as low as 14% of time consumption.
- Score: 7.33244617309908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benefiting from the strong ability of the pre-trained model, the research on
Chinese Word Segmentation (CWS) has made great progress in recent years.
However, due to massive computation, large and complex models are incapable of
empowering their ability for industrial use. On the other hand, for
low-resource scenarios, the prevalent decode method, such as Conditional Random
Field (CRF), fails to exploit the full information of the training data. This
work proposes a fast and accurate CWS framework that incorporates a
light-weighted model and an upgraded decode method (PCRF) towards industrially
low-resource CWS scenarios. First, we distill a Transformer-based student model
as an encoder, which not only accelerates the inference speed but also combines
open knowledge and domain-specific knowledge. Second, the perplexity score to
evaluate the language model is fused into the CRF module to better identify the
word boundaries. Experiments show that our work obtains relatively high
performance on multiple datasets with as low as 14\% of time consumption
compared with the original BERT-based model. Moreover, under the low-resource
setting, we get superior results in comparison with the traditional decoding
methods.
Related papers
- Multi-Fidelity Residual Neural Processes for Scalable Surrogate Modeling [19.60087366873302]
Multi-fidelity surrogate modeling aims to learn an accurate surrogate at the highest fidelity level.
Deep learning approaches utilize neural network based encoders and decoders to improve scalability.
We propose Multi-fidelity Residual Neural Processes (MFRNP), a novel multi-fidelity surrogate modeling framework.
arXiv Detail & Related papers (2024-02-29T04:40:25Z) - Cross-Domain Transfer Learning with CoRTe: Consistent and Reliable
Transfer from Black-Box to Lightweight Segmentation Model [25.3403116022412]
CoRTe is a pseudo-labelling function that extracts reliable knowledge from a black-box source model.
We benchmark CoRTe on two synthetic-to-real settings, demonstrating remarkable results when using black-box models to transfer knowledge on lightweight models for a target data distribution.
arXiv Detail & Related papers (2024-02-20T16:35:14Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Knowledge Transfer-Driven Few-Shot Class-Incremental Learning [23.163459923345556]
Few-shot class-incremental learning (FSCIL) aims to continually learn new classes using a few samples while not forgetting the old classes.
Despite the advance of existing FSCIL methods, the proposed knowledge transfer learning schemes are sub-optimal due to the insufficient optimization for the model's plasticity.
We propose a Random Episode Sampling and Augmentation (RESA) strategy that relies on diverse pseudo incremental tasks as agents to achieve the knowledge transfer.
arXiv Detail & Related papers (2023-06-19T14:02:45Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - Exploring the Value of Pre-trained Language Models for Clinical Named
Entity Recognition [6.917786124918387]
We compare Transformer models that are trained from scratch to fine-tuned BERT-based LLMs.
We examine the impact of an additional CRF layer on such models to encourage contextual learning.
arXiv Detail & Related papers (2022-10-23T16:27:31Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - Hyperparameter-free Continuous Learning for Domain Classification in
Natural Language Understanding [60.226644697970116]
Domain classification is the fundamental task in natural language understanding (NLU)
Most existing continual learning approaches suffer from low accuracy and performance fluctuation.
We propose a hyper parameter-free continual learning model for text data that can stably produce high performance under various environments.
arXiv Detail & Related papers (2022-01-05T02:46:16Z) - Fine-tuning BERT for Low-Resource Natural Language Understanding via
Active Learning [30.5853328612593]
In this work, we explore fine-tuning methods of BERT -- a pre-trained Transformer based language model.
Our experimental results show an advantage in model performance by maximizing the approximate knowledge gain of the model.
We analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters.
arXiv Detail & Related papers (2020-12-04T08:34:39Z) - RethinkCWS: Is Chinese Word Segmentation a Solved Task? [81.11161697133095]
The performance of the Chinese Word (CWS) systems has gradually reached a plateau with the rapid development of deep neural networks.
In this paper, we take stock of what we have achieved and rethink what's left in the CWS task.
arXiv Detail & Related papers (2020-11-13T11:07:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.