Dependency Learning for Legal Judgment Prediction with a Unified
Text-to-Text Transformer
- URL: http://arxiv.org/abs/2112.06370v1
- Date: Mon, 13 Dec 2021 01:38:37 GMT
- Title: Dependency Learning for Legal Judgment Prediction with a Unified
Text-to-Text Transformer
- Authors: Yunyun Huang, Xiaoyu Shen, Chuanyi Li, Jidong Ge, Bin Luo
- Abstract summary: Legal Judgment Prediction involves a series of sub-tasks such as predicting violated law articles, charges and term of penalty.
We propose leveraging a unified text-to-text Transformer for LJP.
We show that this unified transformer, albeit pretrained on general-domain text, outperforms pretrained models tailored specifically for the legal domain.
- Score: 13.896506220470748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given the fact of a case, Legal Judgment Prediction (LJP) involves a series
of sub-tasks such as predicting violated law articles, charges and term of
penalty. We propose leveraging a unified text-to-text Transformer for LJP,
where the dependencies among sub-tasks can be naturally established within the
auto-regressive decoder. Compared with previous works, it has three advantages:
(1) it fits in the pretraining pattern of masked language models, and thereby
can benefit from the semantic prompts of each sub-task rather than treating
them as atomic labels, (2) it utilizes a single unified architecture, enabling
full parameter sharing across all sub-tasks, and (3) it can incorporate both
classification and generative sub-tasks. We show that this unified transformer,
albeit pretrained on general-domain text, outperforms pretrained models
tailored specifically for the legal domain. Through an extensive set of
experiments, we find that the best order to capture dependencies is different
from human intuitions, and the most reasonable logical order for humans can be
sub-optimal for the model. We further include two more auxiliary tasks: court
view generation and article content prediction, showing they can not only
improve the prediction accuracy, but also provide interpretable explanations
for model outputs even when an error is made. With the best configuration, our
model outperforms both previous SOTA and a single-tasked version of the unified
transformer by a large margin.
Related papers
- Making the Most of your Model: Methods for Finetuning and Applying Pretrained Transformers [0.21756081703276003]
This thesis provides methods and analysis of models which make progress on this goal.
We introduce two new finetuning methods which add new capabilities to the models they are used on.
We provide theoretical and empirical insights on the divergence of model-likelihood and output quality.
arXiv Detail & Related papers (2024-08-29T03:50:24Z) - UU-Tax at SemEval-2022 Task 3: Improving the generalizability of
language models for taxonomy classification through data augmentation [0.0]
This paper addresses the SemEval-2022 Task 3 PreTENS: Presupposed Taxonomies evaluating Neural Network Semantics.
The goal of the task is to identify if a sentence is deemed acceptable or not, depending on the taxonomic relationship that holds between a noun pair contained in the sentence.
We propose an effective way to enhance the robustness and the generalizability of language models for better classification.
arXiv Detail & Related papers (2022-10-07T07:41:28Z) - Analyzing Transformers in Embedding Space [59.434807802802105]
We present a theoretical analysis where all parameters of a trained Transformer are interpreted by projecting them into the embedding space.
We show that parameters of both pretrained and fine-tuned models can be interpreted in embedding space.
Our findings open the door to interpretation methods that, at least in part, abstract away from model specifics and operate in the embedding space only.
arXiv Detail & Related papers (2022-09-06T14:36:57Z) - Paragraph-based Transformer Pre-training for Multi-Sentence Inference [99.59693674455582]
We show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks.
We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences.
arXiv Detail & Related papers (2022-05-02T21:41:14Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - Composable Sparse Fine-Tuning for Cross-Lingual Transfer [56.86192078426372]
Fine-tuning all parameters of a pre-trained model has become the mainstream approach for transfer learning.
We introduce a new fine-tuning method with both these desirable properties.
It outperforms adapters in zero-shot cross-lingual transfer by a large margin.
arXiv Detail & Related papers (2021-10-14T17:27:29Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - How Can We Accelerate Progress Towards Human-like Linguistic
Generalization? [22.810889064523167]
The paper describes and critiques the Pretraining-Agnostic Identically Distributed (PAID) evaluation paradigm.
This paradigm consists of three stages: (1) pre-training of a word prediction model on a corpus of arbitrary size; (2) fine-tuning (transfer learning) on a training set representing a classification task; (3) evaluation on a test set drawn from the same distribution as that training set.
arXiv Detail & Related papers (2020-05-03T00:31:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.