Improving NER's Performance with Massive financial corpus
- URL: http://arxiv.org/abs/2007.15871v1
- Date: Fri, 31 Jul 2020 07:00:34 GMT
- Title: Improving NER's Performance with Massive financial corpus
- Authors: Han Zhang
- Abstract summary: Training large deep neural networks needs massive high quality annotation data, but the time and labor costs are too expensive for small business.
We start a company-name recognition task with a small scale and low quality training data, then using skills to enhanced model training speed and predicting performance with minimum labor cost.
- Score: 6.935911489364734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training large deep neural networks needs massive high quality annotation
data, but the time and labor costs are too expensive for small business. We
start a company-name recognition task with a small scale and low quality
training data, then using skills to enhanced model training speed and
predicting performance with minimum labor cost. The methods we use involve
pre-training a lite language model such as Albert-small or Electra-small in
financial corpus, knowledge of distillation and multi-stage learning. The
result is that we raised the recall rate by nearly 20 points and get 4 times as
fast as BERT-CRF model.
Related papers
- AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies [36.645912291368546]
We present AquilaMoE, a cutting-edge bilingual 8*16B Mixture of Experts (MoE) language model with 8 experts with 16 billion parameters each.
This approach optimize performance while minimizing data requirements through a two-stage process.
We successfully trained a 16B model and subsequently the 8*16B AquilaMoE model, demonstrating significant improvements in performance and training efficiency.
arXiv Detail & Related papers (2024-08-13T02:07:00Z) - Fast-ELECTRA for Efficient Pre-training [83.29484808667532]
ELECTRA pre-trains language models by detecting tokens in a sequence that have been replaced by an auxiliary model.
We propose Fast-ELECTRA, which leverages an existing language model as the auxiliary model.
Our approach rivals the performance of state-of-the-art ELECTRA-style pre-training methods, while significantly eliminating the computation and memory cost brought by the joint training of the auxiliary model.
arXiv Detail & Related papers (2023-10-11T09:55:46Z) - DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and
Training Efficiency via Efficient Data Sampling and Routing [57.86954315102865]
DeepSpeed Data Efficiency is a framework that makes better use of data, increases training efficiency, and improves model quality.
For GPT-3 1.3B language model pretraining, our work achieves 12.5x less data/time/cost, while still maintaining 95% of model quality compared to baseline with full data and cost.
For GPT-3 1.3B and BERT-large pretraining, our work can also achieve the same model quality with up to 2x less data/time/cost, or achieve better model quality under same data/time/cost.
arXiv Detail & Related papers (2022-12-07T12:27:28Z) - Feeding What You Need by Understanding What You Learned [54.400455868448695]
Machine Reading (MRC) reveals the ability to understand a given text passage and answer questions based on it.
Existing research works in MRC rely heavily on large-size models and corpus to improve the performance evaluated by metrics such as Exact Match.
We argue that a deep understanding of model capabilities and data properties can help us feed a model with appropriate training data.
arXiv Detail & Related papers (2022-03-05T14:15:59Z) - ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training [65.68511423300812]
We propose ProgFed, a progressive training framework for efficient and effective federated learning.
ProgFed inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models.
Our results show that ProgFed converges at the same rate as standard training on full models.
arXiv Detail & Related papers (2021-10-11T14:45:00Z) - Efficient and Private Federated Learning with Partially Trainable
Networks [8.813191488656527]
We propose to leverage partially trainable neural networks, which freeze a portion of the model parameters during the entire training process.
We empirically show that Federated learning of Partially Trainable neural networks (FedPT) can result in superior communication-accuracy trade-offs.
Our approach also enables faster training, with a smaller memory footprint, and better utility for strong differential privacy guarantees.
arXiv Detail & Related papers (2021-10-06T04:28:33Z) - EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets [106.79387235014379]
EarlyBERT is a general computationally-efficient training algorithm applicable to both pre-training and fine-tuning of large-scale language models.
We are the first to identify structured winning tickets in the early stage of BERT training, and use them for efficient training.
EarlyBERT easily achieves comparable performance to standard BERT with 3545% less training time.
arXiv Detail & Related papers (2020-12-31T20:38:20Z) - A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via
Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy.
Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z) - Sparsifying Transformer Models with Trainable Representation Pooling [5.575448433529451]
We propose a novel method to sparsify attention in the Transformer model by learning to select the most-informative token representations during the training process.
A reduction of quadratic time and memory complexity to sublinear was achieved due to a robust trainable top-$k$ operator.
arXiv Detail & Related papers (2020-09-10T22:49:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.