Related papers: LexGPT 0.1: pre-trained GPT-J models with Pile of Law

LexGPT 0.1: pre-trained GPT-J models with Pile of Law

URL: http://arxiv.org/abs/2306.05431v1
Date: Mon, 5 Jun 2023 08:42:59 GMT
Title: LexGPT 0.1: pre-trained GPT-J models with Pile of Law
Authors: Jieh-Sheng Lee
Abstract summary: This research aims to build generative language models specialized for the legal domain. The manuscript presents the development of LexGPT models based on GPT-J models and pre-trained with Pile of Law.
Score: 1.8275108630751844
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This research aims to build generative language models specialized for the legal domain. The manuscript presents the development of LexGPT models based on GPT-J models and pre-trained with Pile of Law. The foundation model built in this manuscript is the initial step for the development of future applications in the legal domain, such as further training with reinforcement learning from human feedback. Another objective of this manuscript is to assist legal professionals in utilizing language models through the ``No Code'' approach. By fine-tuning models with specialized data and without modifying any source code, legal professionals can create custom language models for downstream tasks with minimum effort and technical knowledge. The downstream task in this manuscript is to turn a LexGPT model into a classifier, although the performance is notably lower than the state-of-the-art result. How to enhance downstream task performance without modifying the model or its source code is a research topic for future exploration.

Related papers

ModelGrow: Continual Text-to-Video Pre-training with Model Expansion and Language Understanding Enhancement [49.513401043490305]
This work explores the continual general pre-training of text-to-video models. We break this task into two key aspects: increasing model capacity and improving semantic understanding. For semantic understanding, we propose a method that leverages large language models as advanced text encoders.
arXiv Detail & Related papers (2024-12-25T18:58:07Z)
TransformLLM: Adapting Large Language Models via LLM-Transformed Reading Comprehension Text [5.523385345486362]
We have developed language models specifically designed for legal applications. Our innovative approach significantly improves capabilities in legal tasks by using Large Language Models (LLMs) to convert raw training data into reading comprehension text.
arXiv Detail & Related papers (2024-10-28T19:32:18Z)
Large Language Model for Verilog Generation with Golden Code Feedback [29.135207235743795]
This study introduces a novel approach utilizing reinforcement learning with golden code feedback to enhance the performance of pre-trained models. We have achieved state-of-the-art (SOTA) results with a substantial margin. Notably, our 6.7B parameter model ours demonstrates superior performance compared to current best-in-class 13B and 16B models.
arXiv Detail & Related papers (2024-07-21T11:25:21Z)
FOCUS: Forging Originality through Contrastive Use in Self-Plagiarism for Language Models [38.76912842622624]
Pre-trained Language Models (PLMs) have shown impressive results in various Natural Language Generation (NLG) tasks. This study introduces a unique "self-plagiarism" contrastive decoding strategy, aimed at boosting the originality of text produced by PLMs.
arXiv Detail & Related papers (2024-06-02T19:17:00Z)
JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models [53.83273575102087]
We propose an unsupervised inference-time approach to authorship obfuscation. We introduce JAMDEC, a user-controlled, inference-time algorithm for authorship obfuscation. Our approach builds on small language models such as GPT2-XL in order to help avoid disclosing the original content to proprietary LLM's APIs.
arXiv Detail & Related papers (2024-02-13T19:54:29Z)
Revisiting Topic-Guided Language Models [20.21486464604549]
We study four topic-guided language models and two baselines, evaluating the held-out predictive performance of each model on four corpora. We find that none of these methods outperform a standard LSTM language model baseline, and most fail to learn good topics.
arXiv Detail & Related papers (2023-12-04T20:33:24Z)
Matching Pairs: Attributing Fine-Tuned Models to their Pre-Trained Large Language Models [11.57282859281814]
We consider different knowledge levels and attribution strategies, and find that we can correctly trace back 8 out of the 10 fine tuned models with our best method.
arXiv Detail & Related papers (2023-06-15T17:42:48Z)
Fine-grained Text Style Transfer with Diffusion-Based Language Models [50.02698074338317]
We trained a diffusion-based model on StylePTB dataset, the standard benchmark for fine-grained text style transfers. Our model was able to achieve state-of-the-art performance on both individual and compositional transfers.
arXiv Detail & Related papers (2023-05-31T02:51:26Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
Stochastic Code Generation [1.7205106391379026]
Large language models pre-trained for code generation can generate high-quality short code but often struggle with generating coherent long code. This issue is also observed in language modeling for long text generation. In this study, we investigate whether this technique can be applied to code generation to improve coherence.
arXiv Detail & Related papers (2023-04-14T00:01:05Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
Masked Autoencoders As The Unified Learners For Pre-Trained Sentence Representation [77.47617360812023]
We extend the recently proposed MAE style pre-training strategy, RetroMAE, to support a wide variety of sentence representation tasks. The first stage performs RetroMAE over generic corpora, like Wikipedia, BookCorpus, etc., from which the base model is learned. The second stage takes place on domain-specific data, e.g., MS MARCO and NLI, where the base model is continuingly trained based on RetroMAE and contrastive learning.
arXiv Detail & Related papers (2022-07-30T14:34:55Z)
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing [78.8500633981247]
This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning" Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly.
arXiv Detail & Related papers (2021-07-28T18:09:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.