Billions of Parameters Are Worth More Than In-domain Training Data: A
case study in the Legal Case Entailment Task
- URL: http://arxiv.org/abs/2205.15172v1
- Date: Mon, 30 May 2022 15:21:26 GMT
- Title: Billions of Parameters Are Worth More Than In-domain Training Data: A
case study in the Legal Case Entailment Task
- Authors: Guilherme Moraes Rosa and Luiz Bonifacio and Vitor Jeronymo and Hugo
Abonizio and Roberto Lotufo and Rodrigo Nogueira
- Abstract summary: We show that scaling the number of parameters in a language model improves the F1 score of our previous zero-shot result by more than 6 points.
Despite the challenges posed by large language models, we provide a demonstration of our zero-shot monoT5-3b model being used in production as a search engine.
- Score: 4.186775801993103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has shown that language models scaled to billions of parameters,
such as GPT-3, perform remarkably well in zero-shot and few-shot scenarios. In
this work, we experiment with zero-shot models in the legal case entailment
task of the COLIEE 2022 competition. Our experiments show that scaling the
number of parameters in a language model improves the F1 score of our previous
zero-shot result by more than 6 points, suggesting that stronger zero-shot
capability may be a characteristic of larger models, at least for this task.
Our 3B-parameter zero-shot model outperforms all models, including ensembles,
in the COLIEE 2021 test set and also achieves the best performance of a single
model in the COLIEE 2022 competition, second only to the ensemble composed of
the 3B model itself and a smaller version of the same model. Despite the
challenges posed by large language models, mainly due to latency constraints in
real-time applications, we provide a demonstration of our zero-shot monoT5-3b
model being used in production as a search engine, including for legal
documents. The code for our submission and the demo of our system are available
at https://github.com/neuralmind-ai/coliee and
https://neuralsearchx.neuralmind.ai, respectively.
Related papers
- Enabling Small Models for Zero-Shot Classification through Model Label Learning [50.68074833512999]
We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities.
Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL.
arXiv Detail & Related papers (2024-08-21T09:08:26Z) - Specializing Smaller Language Models towards Multi-Step Reasoning [56.78474185485288]
We show that abilities can be distilled down from GPT-3.5 ($ge$ 175B) to T5 variants ($le$ 11B)
We propose model specialization, to specialize the model's ability towards a target task.
arXiv Detail & Related papers (2023-01-30T08:51:19Z) - Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language
Models [23.818751895205132]
Go-tuning is a geometry-guided self-supervised learning method.
Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B)
arXiv Detail & Related papers (2022-12-20T17:36:49Z) - Composing Ensembles of Pre-trained Models via Iterative Consensus [95.10641301155232]
We propose a unified framework for composing ensembles of different pre-trained models.
We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization.
We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer.
arXiv Detail & Related papers (2022-10-20T18:46:31Z) - Zero-Shot Learners for Natural Language Understanding via a Unified
Multiple Choice Perspective [26.41585967095811]
Zero-shot learning aims to train a model on a given task such that it can address new learning tasks without any additional training.
Our approach converts zero-shot learning into multiple-choice tasks, avoiding problems in commonly used large-scale generative models such as FLAN.
Our approach shows state-of-the-art performance on several benchmarks and produces satisfactory results on tasks such as natural language inference and text classification.
arXiv Detail & Related papers (2022-10-16T17:24:06Z) - Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple
Tasks [77.90900650816046]
We introduce $textZemi$, a zero-shot semi-parametric language model.
We train $textZemi$ with a novel semi-parametric multitask prompted training paradigm.
Specifically, we augment the multitask training and zero-shot evaluation with retrieval from a large-scale task-agnostic unlabeled corpus.
arXiv Detail & Related papers (2022-10-01T04:08:50Z) - What Language Model Architecture and Pretraining Objective Work Best for
Zero-Shot Generalization? [50.84738303888189]
We present a large-scale evaluation of modeling choices and their impact on zero-shot generalization.
We train models with over 5 billion parameters for more than 170 billion tokens.
We find that pretrained causal decoder models can be efficiently adapted into non-causal decoder models.
arXiv Detail & Related papers (2022-04-12T14:19:49Z) - To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment [4.9069311006119865]
We show that pretrained language models fine-tuned on diverse datasets can transfer well to a variety of out-of-domain tasks.
We participated in the legal case entailment task of COLIEE 2021, in which we use such models with no adaptations to the target domain.
Our experiments confirm a counter-intuitive result in the new paradigm of pretrained language models.
arXiv Detail & Related papers (2022-02-07T13:02:48Z) - Efficient Large Scale Language Modeling with Mixtures of Experts [61.45159383372181]
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation.
This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings.
arXiv Detail & Related papers (2021-12-20T17:05:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.