Legal-Tech Open Diaries: Lesson learned on how to develop and deploy
light-weight models in the era of humongous Language Models
- URL: http://arxiv.org/abs/2210.13086v1
- Date: Mon, 24 Oct 2022 10:08:59 GMT
- Title: Legal-Tech Open Diaries: Lesson learned on how to develop and deploy
light-weight models in the era of humongous Language Models
- Authors: Stelios Maroudas, Sotiris Legkas, Prodromos Malakasiotis, Ilias
Chalkidis
- Abstract summary: We follow the steps of the R&D group of a modern legal-tech start-up and present important insights on model development and deployment.
We start from ground zero by pre-training multiple domain-specific multi-lingual LMs which are a better fit to contractual and regulatory text.
We present benchmark results of such models in a half-public half-private legal benchmark comprising 5 downstream tasks showing the impact of larger model size.
- Score: 10.086015702323971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of billion-parameter-sized Language Models (LMs), start-ups have
to follow trends and adapt their technology accordingly. Nonetheless, there are
open challenges since the development and deployment of large models comes with
a need for high computational resources and has economical consequences. In
this work, we follow the steps of the R&D group of a modern legal-tech start-up
and present important insights on model development and deployment. We start
from ground zero by pre-training multiple domain-specific multi-lingual LMs
which are a better fit to contractual and regulatory text compared to the
available alternatives (XLM-R). We present benchmark results of such models in
a half-public half-private legal benchmark comprising 5 downstream tasks
showing the impact of larger model size. Lastly, we examine the impact of a
full-scale pipeline for model compression which includes: a) Parameter Pruning,
b) Knowledge Distillation, and c) Quantization: The resulting models are much
more efficient without sacrificing performance at large.
Related papers
- Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches [64.42735183056062]
Large language models (LLMs) have transitioned from specialized models to versatile foundation models.
LLMs exhibit impressive zero-shot ability, however, require fine-tuning on local datasets and significant resources for deployment.
arXiv Detail & Related papers (2024-08-20T09:42:17Z) - Super Tiny Language Models [3.8353434814956517]
This paper introduces a series of research efforts focused on Super Tiny Language Models (STLMs)
We explore innovative techniques such as byte-level tokenization with a pooling mechanism, weight tying, and efficient training strategies.
Our ultimate goal is to make high-performance language models more accessible and practical for a wide range of applications.
arXiv Detail & Related papers (2024-05-23T04:12:49Z) - MindLLM: Pre-training Lightweight Large Language Model from Scratch,
Evaluations and Domain Applications [46.337078949637345]
We present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch.
A thorough account of experiences accrued during large model development is given, covering every step of the process.
MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks.
arXiv Detail & Related papers (2023-10-24T12:22:34Z) - Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks.
However, the massive size of these models poses huge challenges for their deployment in real-world applications.
We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z) - Adapting Large Language Models for Content Moderation: Pitfalls in Data
Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains.
In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z) - Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large
Language Models [11.57282859281814]
We consider different knowledge levels and attribution strategies, and find that we can correctly trace back 8 out of the 10 fine tuned models with our best method.
arXiv Detail & Related papers (2023-06-15T17:42:48Z) - A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades.
Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora.
To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - METRO: Efficient Denoising Pretraining of Large Scale Autoencoding
Language Models with Model Generated Signals [151.3601429216877]
We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model.
We propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO)
The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks.
arXiv Detail & Related papers (2022-04-13T21:39:15Z) - Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow [14.422129911404472]
Bellman aims to fill this gap and introduces the first thoroughly designed and tested model-based RL toolbox.
Our modular approach enables to combine a wide range of environment models with generic model-based agent classes that recover state-of-the-art algorithms.
arXiv Detail & Related papers (2021-03-26T11:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.