Methods for Computing Legal Document Similarity: A Comparative Study
- URL: http://arxiv.org/abs/2004.12307v1
- Date: Sun, 26 Apr 2020 08:26:04 GMT
- Title: Methods for Computing Legal Document Similarity: A Comparative Study
- Authors: Paheli Bhattacharya, Kripabandhu Ghosh, Arindam Pal, Saptarshi Ghosh
- Abstract summary: Finding similar legal documents is an important and challenging task in the domain of Legal Information Retrieval.
We propose two broad ways of measuring similarity between legal documents - analyzing the precedent citation network, and measuring similarity based on textual content similarity measures.
We explore two promising new similarity computation methods - one text-based and the other based on network embeddings, which have not been considered till now.
- Score: 9.007583099505954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computing similarity between two legal documents is an important and
challenging task in the domain of Legal Information Retrieval. Finding similar
legal documents has many applications in downstream tasks, including prior-case
retrieval, recommendation of legal articles, and so on. Prior works have
proposed two broad ways of measuring similarity between legal documents -
analyzing the precedent citation network, and measuring similarity based on
textual content similarity measures. But there has not been a comprehensive
comparison of these existing methods on a common platform. In this paper, we
perform the first systematic analysis of the existing methods. In addition, we
explore two promising new similarity computation methods - one text-based and
the other based on network embeddings, which have not been considered till now.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Co-Matching: Towards Human-Machine Collaborative Legal Case Matching [69.21196368715144]
Successful legal case matching requires tacit knowledge of legal practitioners, which is difficult to verbalize and encode into machines.
We propose a collaborative matching framework called Co-Matching, which encourages both the machine and the legal practitioner to participate in the matching process.
Our study represents a pioneering effort in human-machine collaboration for the matching task, marking a milestone for future collaborative matching studies.
arXiv Detail & Related papers (2024-05-16T16:50:31Z) - MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness.
Existing SCR datasets only focus on the fact description section when judging the similarity between cases.
We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z) - Analysing the Resourcefulness of the Paragraph for Precedence Retrieval [0.1761604268733064]
We analyzed the resourcefulness of paragraph-level information in capturing similarity among judgments for improving the performance of precedence retrieval.
We found that the paragraph-level methods could capture the similarity among the judgments with only a few paragraph interactions and exhibit more discriminating power over the baseline document-level method.
arXiv Detail & Related papers (2023-07-29T08:55:38Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Specialized Document Embeddings for Aspect-based Similarity of Research
Papers [4.661692753666685]
We treat aspect-based similarity as a classical vector similarity problem in aspect-specific embedding spaces.
We represent a document not as a single generic embedding but as multiple specialized embeddings.
Our approach mitigates potential risks arising from implicit biases by making them explicit.
arXiv Detail & Related papers (2022-03-28T07:35:26Z) - PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense
Passage Retrieval [87.68667887072324]
We propose a novel approach that leverages query-centric and PAssage-centric sImilarity Relations (called PAIR) for dense passage retrieval.
To implement our approach, we make three major technical contributions by introducing formal formulations of the two kinds of similarity relations.
Our approach significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions datasets.
arXiv Detail & Related papers (2021-08-13T02:07:43Z) - Efficient Clustering from Distributions over Topics [0.0]
We present an approach that relies on the results of a topic modeling algorithm over documents in a collection as a means to identify smaller subsets of documents where the similarity function can be computed.
This approach has proved to obtain promising results when identifying similar documents in the domain of scientific publications.
arXiv Detail & Related papers (2020-12-15T10:52:19Z) - Aspect-based Document Similarity for Research Papers [4.661692753666685]
We extend similarity with aspect information by performing a pairwise document classification task.
We evaluate our aspect-based document similarity for research papers.
Our results show SciBERT as the best performing system.
arXiv Detail & Related papers (2020-10-13T13:51:21Z) - Hier-SPCNet: A Legal Statute Hierarchy-based Heterogeneous Network for
Computing Legal Case Document Similarity [9.007583099505954]
All prior network-based similarity methods considered a precedent citation network among case documents only (PCNet)
We propose to augment the PCNet with the hierarchy of legal statutes, to form a heterogeneous network Hier-SPCNet.
Experiments over a set of Indian Supreme Court case documents show that our proposed heterogeneous network enables significantly better document similarity estimation.
arXiv Detail & Related papers (2020-07-07T06:30:46Z) - Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised
Deep Asymmetric Metric Learning [62.34197797857823]
A central problem in automatic reconstruction of shredded documents is the pairwise compatibility evaluation of the shreds.
This work proposes a scalable deep learning approach for measuring pairwise compatibility in which the number of inferences scales linearly.
Our method has accuracy comparable to the state-of-the-art with a speed-up of about 22 times for a test instance with 505 shreds.
arXiv Detail & Related papers (2020-03-23T03:22:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.