To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment
- URL: http://arxiv.org/abs/2202.03120v1
- Date: Mon, 7 Feb 2022 13:02:48 GMT
- Title: To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment
- Authors: Guilherme Moraes Rosa, Ruan Chaves Rodrigues, Roberto de Alencar
Lotufo, Rodrigo Nogueira
- Abstract summary: We show that pretrained language models fine-tuned on diverse datasets can transfer well to a variety of out-of-domain tasks.
We participated in the legal case entailment task of COLIEE 2021, in which we use such models with no adaptations to the target domain.
Our experiments confirm a counter-intuitive result in the new paradigm of pretrained language models.
- Score: 4.9069311006119865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been mounting evidence that pretrained language models fine-tuned
on large and diverse supervised datasets can transfer well to a variety of
out-of-domain tasks. In this work, we investigate this transfer ability to the
legal domain. For that, we participated in the legal case entailment task of
COLIEE 2021, in which we use such models with no adaptations to the target
domain. Our submissions achieved the highest scores, surpassing the second-best
team by more than six percentage points. Our experiments confirm a
counter-intuitive result in the new paradigm of pretrained language models:
given limited labeled data, models with little or no adaptation to the target
task can be more robust to changes in the data distribution than models
fine-tuned on it. Code is available at https://github.com/neuralmind-ai/coliee.
Related papers
- Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection [37.65064631532493]
Finetuning a pretrained model to perform unsupervised prediction on data from a target domain presents two challenges.
We measure the efficiency of injecting pretraining data into the finetuning data mixture to avoid forgetting and mitigate overfitting.
A key practical takeaway from our study is that injecting as little as 1% of pretraining data in the finetuning data mixture prevents the model from forgetting the pretraining set.
arXiv Detail & Related papers (2025-02-09T21:44:27Z) - TransformLLM: Adapting Large Language Models via LLM-Transformed Reading Comprehension Text [5.523385345486362]
We have developed language models specifically designed for legal applications.
Our innovative approach significantly improves capabilities in legal tasks by using Large Language Models (LLMs) to convert raw training data into reading comprehension text.
arXiv Detail & Related papers (2024-10-28T19:32:18Z) - A Small Claims Court for the NLP: Judging Legal Text Classification Strategies With Small Datasets [0.0]
This paper investigates the best strategies for optimizing the use of a small labeled dataset and large amounts of unlabeled data.
We use the records of demands to a Brazilian Public Prosecutor's Office aiming to assign the descriptions in one of the subjects.
The best result was obtained with Unsupervised Data Augmentation (UDA), which jointly uses BERT, data augmentation, and strategies of semi-supervised learning.
arXiv Detail & Related papers (2024-09-09T18:10:05Z) - Pre-Trained Model Recommendation for Downstream Fine-tuning [22.343011779348682]
Model selection aims to rank off-the-shelf pre-trained models and select the most suitable one for the new target task.
Existing model selection techniques are often constrained in their scope and tend to overlook the nuanced relationships between models and tasks.
We present a pragmatic framework textbfFennec, delving into a diverse, large-scale model repository.
arXiv Detail & Related papers (2024-03-11T02:24:32Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - $\Delta$-Patching: A Framework for Rapid Adaptation of Pre-trained
Convolutional Networks without Base Performance Loss [71.46601663956521]
Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time.
We propose $Delta$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies.
Our experiments show that $Delta$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained.
arXiv Detail & Related papers (2023-03-26T16:39:44Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model
From Scratch? [0.0]
We train Longformer models with the efficient RTD task on legal data to showcase that pretraining efficient LMs is possible using much less compute.
We find that both the small and base models outperform their baselines on the in-domain BillSum and out-of-domain tasks.
arXiv Detail & Related papers (2022-11-30T16:09:20Z) - A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings.
We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z) - Billions of Parameters Are Worth More Than In-domain Training Data: A
case study in the Legal Case Entailment Task [4.186775801993103]
We show that scaling the number of parameters in a language model improves the F1 score of our previous zero-shot result by more than 6 points.
Despite the challenges posed by large language models, we provide a demonstration of our zero-shot monoT5-3b model being used in production as a search engine.
arXiv Detail & Related papers (2022-05-30T15:21:26Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - $n$-Reference Transfer Learning for Saliency Prediction [73.17061116358036]
We propose a few-shot transfer learning paradigm for saliency prediction.
The proposed framework is gradient-based and model-agnostic.
The results show that the proposed framework achieves a significant performance improvement.
arXiv Detail & Related papers (2020-07-09T23:20:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.