SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval
- URL: http://arxiv.org/abs/2304.11370v1
- Date: Sat, 22 Apr 2023 10:47:01 GMT
- Title: SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval
- Authors: Haitao Li, Qingyao Ai, Jia Chen, Qian Dong, Yueyue Wu, Yiqun Liu,
Chong Chen, Qi Tian
- Abstract summary: Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
- Score: 75.05173891207214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Legal case retrieval, which aims to find relevant cases for a query case,
plays a core role in the intelligent legal system. Despite the success that
pre-training has achieved in ad-hoc retrieval tasks, effective pre-training
strategies for legal case retrieval remain to be explored. Compared with
general documents, legal case documents are typically long text sequences with
intrinsic logical structures. However, most existing language models have
difficulty understanding the long-distance dependencies between different
structures. Moreover, in contrast to the general retrieval, the relevance in
the legal domain is sensitive to key legal elements. Even subtle differences in
key legal elements can significantly affect the judgement of relevance.
However, existing pre-trained language models designed for general purposes
have not been equipped to handle legal elements.
To address these issues, in this paper, we propose SAILER, a new
Structure-Aware pre-traIned language model for LEgal case Retrieval. It is
highlighted in the following three aspects: (1) SAILER fully utilizes the
structural information contained in legal case documents and pays more
attention to key legal elements, similar to how legal experts browse legal case
documents. (2) SAILER employs an asymmetric encoder-decoder architecture to
integrate several different pre-training objectives. In this way, rich semantic
information across tasks is encoded into dense vectors. (3) SAILER has powerful
discriminative ability, even without any legal annotation data. It can
distinguish legal cases with different charges accurately. Extensive
experiments over publicly available legal benchmarks demonstrate that our
approach can significantly outperform previous state-of-the-art methods in
legal case retrieval.
Related papers
- LawLLM: Law Large Language Model for the US Legal System [43.13850456765944]
We introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain.
LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP)
We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format.
arXiv Detail & Related papers (2024-07-27T21:51:30Z) - Learning Interpretable Legal Case Retrieval via Knowledge-Guided Case Reformulation [22.85652668826498]
This paper introduces KELLER, a legal knowledge-guided case reformulation approach based on large language models (LLMs)
By incorporating professional legal knowledge about crimes and law articles, we enable large language models to accurately reformulate the original legal case into concise sub-facts of crimes.
arXiv Detail & Related papers (2024-06-28T08:59:45Z) - DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval.
We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability.
Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z) - MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness.
Existing SCR datasets only focus on the fact description section when judging the similarity between cases.
We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z) - Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model
Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI.
Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems.
Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z) - Legal Element-oriented Modeling with Multi-view Contrastive Learning for
Legal Case Retrieval [3.909749182759558]
We propose an interaction-focused network for legal case retrieval with a multi-view contrastive learning objective.
Case-view contrastive learning minimizes the hidden space distance between relevant legal case representations.
We employ a legal element knowledge-aware indicator to detect legal elements of cases.
arXiv Detail & Related papers (2022-10-11T06:47:23Z) - Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding.
We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.