THUIR@COLIEE 2023: Incorporating Structural Knowledge into Pre-trained
Language Models for Legal Case Retrieval
- URL: http://arxiv.org/abs/2305.06812v1
- Date: Thu, 11 May 2023 14:08:53 GMT
- Title: THUIR@COLIEE 2023: Incorporating Structural Knowledge into Pre-trained
Language Models for Legal Case Retrieval
- Authors: Haitao Li, Weihang Su, Changyue Wang, Yueyue Wu, Qingyao Ai, Yiqun Liu
- Abstract summary: This paper summarizes the approach of the championship team THUIR in COLIEE 2023.
To be specific, we design structure-aware pre-trained language models to enhance the understanding of legal cases.
In the end, learning-to-rank methods are employed to merge features with different dimensions.
- Score: 16.191450092389722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Legal case retrieval techniques play an essential role in modern intelligent
legal systems. As an annually well-known international competition, COLIEE is
aiming to achieve the state-of-the-art retrieval model for legal texts. This
paper summarizes the approach of the championship team THUIR in COLIEE 2023. To
be specific, we design structure-aware pre-trained language models to enhance
the understanding of legal cases. Furthermore, we propose heuristic
pre-processing and post-processing approaches to reduce the influence of
irrelevant messages. In the end, learning-to-rank methods are employed to merge
features with different dimensions. Experimental results demonstrate the
superiority of our proposal. Official results show that our run has the best
performance among all submissions. The implementation of our method can be
found at https://github.com/CSHaitao/THUIR-COLIEE2023.
Related papers
- DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval.
We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability.
Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z) - Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews [51.453135368388686]
We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM)
Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level.
arXiv Detail & Related papers (2024-03-11T21:51:39Z) - CAPTAIN at COLIEE 2023: Efficient Methods for Legal Information
Retrieval and Entailment Tasks [7.0271825812050555]
This paper outlines our strategies for tackling Task 2, Task 3, and Task 4 in the COLIEE 2023 competition.
Our approach involved utilizing appropriate state-of-the-art deep learning methods, designing methods based on domain characteristics observation, and applying meticulous engineering practices and methodologies to the competition.
arXiv Detail & Related papers (2024-01-07T17:23:27Z) - Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model
Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI.
Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems.
Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z) - NOWJ1@ALQAC 2023: Enhancing Legal Task Performance with Classic
Statistical Models and Pre-trained Language Models [4.329463429688995]
This paper describes the NOWJ1 Team's approach for the Automated Legal Question Answering Competition (ALQAC) 2023.
For the document retrieval task, we implement a pre-processing step to overcome input limitations and apply learning-to-rank methods to consolidate features from various models.
We incorporate state-of-the-art models to develop distinct systems for each sub-task, utilizing both classic statistical models and pre-trained Language Models.
arXiv Detail & Related papers (2023-09-16T18:32:15Z) - NOWJ at COLIEE 2023 -- Multi-Task and Ensemble Approaches in Legal
Information Processing [1.5593460008414899]
We present the NOWJ team's approach to the COLIEE 2023 Competition, which focuses on advancing legal information processing techniques.
We employ state-of-the-art machine learning models and innovative approaches, such as BERT, Longformer, BM25-ranking algorithm, and multi-task learning models.
arXiv Detail & Related papers (2023-06-08T03:10:49Z) - THUIR@COLIEE 2023: More Parameters and Legal Knowledge for Legal Case
Entailment [16.191450092389722]
This paper describes the approach of the THUIR team at the COLIEE 2023 Legal Case Entailment task.
We try traditional lexical matching methods and pre-trained language models with different sizes.
We get the third place in COLIEE 2023.
arXiv Detail & Related papers (2023-05-11T14:11:48Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - Understand Legal Documents with Contextualized Large Language Models [16.416510744265086]
We present our systems for SemEval-2023 Task 6: understanding legal texts.
We first develop the Legal-BERT-HSLN model that considers the comprehensive context information in both intra- and inter-sentence levels.
We then train a Legal-LUKE model, which is legal-contextualized and entity-aware, to recognize legal entities.
arXiv Detail & Related papers (2023-03-21T18:48:11Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding.
We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.