A Survey on Legal Judgment Prediction: Datasets, Metrics, Models and
Challenges
- URL: http://arxiv.org/abs/2204.04859v1
- Date: Mon, 11 Apr 2022 04:06:28 GMT
- Title: A Survey on Legal Judgment Prediction: Datasets, Metrics, Models and
Challenges
- Authors: Junyun Cui, Xiaoyu Shen, Feiping Nie, Zheng Wang, Jinglong Wang and
Yulong Chen
- Abstract summary: Legal judgment prediction (LJP) applies Natural Language Processing (NLP) techniques to predict judgment results based on fact descriptions automatically.
We analyze 31 LJP datasets in 6 languages, present their construction process and define a classification method of LJP.
We show the state-of-art results for 8 representative datasets from different court cases and discuss the open challenges.
- Score: 73.34944216896837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Legal judgment prediction (LJP) applies Natural Language Processing (NLP)
techniques to predict judgment results based on fact descriptions
automatically. Recently, large-scale public datasets and advances in NLP
research have led to increasing interest in LJP. Despite a clear gap between
machine and human performance, impressive results have been achieved in various
benchmark datasets. In this paper, to address the current lack of comprehensive
survey of existing LJP tasks, datasets, models and evaluations, (1) we analyze
31 LJP datasets in 6 languages, present their construction process and define a
classification method of LJP with 3 different attributes; (2) we summarize 14
evaluation metrics under four categories for different outputs of LJP tasks;
(3) we review 12 legal-domain pretrained models in 3 languages and highlight 3
major research directions for LJP; (4) we show the state-of-art results for 8
representative datasets from different court cases and discuss the open
challenges. This paper can provide up-to-date and comprehensive reviews to help
readers understand the status of LJP. We hope to facilitate both NLP
researchers and legal professionals for further joint efforts in this problem.
Related papers
- Empowering Prior to Court Legal Analysis: A Transparent and Accessible Dataset for Defensive Statement Classification and Interpretation [5.646219481667151]
This paper introduces a novel dataset tailored for classification of statements made during police interviews, prior to court proceedings.
We introduce a fine-tuned DistilBERT model that achieves state-of-the-art performance in distinguishing truthful from deceptive statements.
We also present an XAI interface that empowers both legal professionals and non-specialists to interact with and benefit from our system.
arXiv Detail & Related papers (2024-05-17T11:22:27Z) - Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z) - LM4OPT: Unveiling the Potential of Large Language Models in Formulating
Mathematical Optimization Problems [0.0]
This study compares prominent Large Language Models, including GPT-3.5, GPT-4, and Llama-2-7b, in zero-shot and one-shot settings.
Our findings show GPT-4's superior performance, particularly in the one-shot scenario.
arXiv Detail & Related papers (2024-03-02T23:32:33Z) - Prediction of Arabic Legal Rulings using Large Language Models [1.3499500088995464]
This paper pioneers a comprehensive predictive analysis of Arabic court decisions on a dataset of 10,813 commercial court real cases.
We evaluate three prevalent foundational models (LLaMA-7b, JAIS-13b, and GPT3.5-turbo) and three training paradigms: zero-shot, one-shot, and tailored fine-tuning.
We show that GPT-3.5-based models outperform all other models by a wide margin, surpassing the average score of the dedicated Arabic-centric JAIS model by 50%.
arXiv Detail & Related papers (2023-10-16T10:37:35Z) - Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model
Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI.
Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems.
Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - L-Eval: Instituting Standardized Evaluation for Long Context Language
Models [91.05820785008527]
We propose L-Eval to institute a more standardized evaluation for long context language models (LCLMs)
We build a new evaluation suite containing 20 sub-tasks, 508 long documents, and over 2,000 human-labeled query-response pairs.
Results show that popular n-gram matching metrics generally can not correlate well with human judgment.
arXiv Detail & Related papers (2023-07-20T17:59:41Z) - SemEval-2023 Task 11: Learning With Disagreements (LeWiDi) [75.85548747729466]
We report on the second edition of the LeWiDi series of shared tasks.
This second edition attracted a wide array of participants resulting in 13 shared task submission papers.
arXiv Detail & Related papers (2023-04-28T12:20:35Z) - ClassActionPrediction: A Challenging Benchmark for Legal Judgment
Prediction of Class Action Cases in the US [0.0]
We release for the first time a challenging LJP dataset focused on class action cases in the US.
It is the first dataset in the common law system that focuses on the harder and more realistic task involving the complaints as input instead of the often used facts summary written by the court.
Our Longformer model clearly outperforms the human baseline (63%), despite only considering the first 2,048 tokens. Furthermore, we perform a detailed error analysis and find that the Longformer model is significantly better calibrated than the human experts.
arXiv Detail & Related papers (2022-11-01T16:57:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.