Evaluating Document Representations for Content-based Legal Literature
Recommendations
- URL: http://arxiv.org/abs/2104.13841v1
- Date: Wed, 28 Apr 2021 15:48:19 GMT
- Title: Evaluating Document Representations for Content-based Legal Literature
Recommendations
- Authors: Malte Ostendorff, Elliott Ash, Terry Ruas, Bela Gipp, Julian
Moreno-Schneider, Georg Rehm
- Abstract summary: Legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datasets.
We evaluate text-based (e.g., fastText, Transformers), citation-based (e.g., DeepWalk, Poincar'e), and hybrid methods.
Our experiments show that document representations from averaged fastText word vectors (trained on legal corpora) yield the best results.
- Score: 6.4815284696225905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recommender systems assist legal professionals in finding relevant literature
for supporting their case. Despite its importance for the profession, legal
applications do not reflect the latest advances in recommender systems and
representation learning research. Simultaneously, legal recommender systems are
typically evaluated in small-scale user study without any public available
benchmark datasets. Thus, these studies have limited reproducibility. To
address the gap between research and practice, we explore a set of
state-of-the-art document representation methods for the task of retrieving
semantically related US case law. We evaluate text-based (e.g., fastText,
Transformers), citation-based (e.g., DeepWalk, Poincar\'e), and hybrid methods.
We compare in total 27 methods using two silver standards with annotations for
2,964 documents. The silver standards are newly created from Open Case Book and
Wikisource and can be reused under an open license facilitating
reproducibility. Our experiments show that document representations from
averaged fastText word vectors (trained on legal corpora) yield the best
results, closely followed by Poincar\'e citation embeddings. Combining fastText
and Poincar\'e in a hybrid manner further improves the overall result. Besides
the overall performance, we analyze the methods depending on document length,
citation count, and the coverage of their recommendations. We make our source
code, models, and datasets publicly available at
https://github.com/malteos/legal-document-similarity/.
Related papers
- JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - On Search Strategies for Document-Level Neural Machine Translation [51.359400776242786]
Document-level neural machine translation (NMT) models produce a more consistent output across a document.
In this work, we aim to answer the question how to best utilize a context-aware translation model in decoding.
arXiv Detail & Related papers (2023-06-08T11:30:43Z) - Attentive Deep Neural Networks for Legal Document Retrieval [2.4350217735794337]
We study the use of attentive neural network-based text representation for statute law document retrieval.
We develop two hierarchical architectures with sparse attention to represent long sentences and articles, and we name them Attentive CNN and Paraformer.
Experimental results show that Attentive neural methods substantially outperform non-neural methods in terms of retrieval performance across datasets and languages.
arXiv Detail & Related papers (2022-12-13T01:37:27Z) - Tag-Aware Document Representation for Research Paper Recommendation [68.8204255655161]
We propose a hybrid approach that leverages deep semantic representation of research papers based on social tags assigned by users.
The proposed model is effective in recommending research papers even when the rating data is very sparse.
arXiv Detail & Related papers (2022-09-08T09:13:07Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Context-Aware Legal Citation Recommendation using Deep Learning [4.157772749568094]
Lawyers and judges spend a large amount of time researching the proper legal authority to cite while drafting decisions.
We develop a citation recommendation tool that can help improve efficiency in the process of opinion drafting.
arXiv Detail & Related papers (2021-06-20T23:23:11Z) - Learning Fine-grained Fact-Article Correspondence in Legal Cases [19.606628325747938]
We create a corpus with manually annotated fact-article correspondences.
We parse articles in form of premise-conclusion pairs with random forest.
Our best system reaches an F1 score of 96.3%, making it of great potential for practical use.
arXiv Detail & Related papers (2021-04-21T19:06:58Z) - Building Legal Case Retrieval Systems with Lexical Matching and
Summarization using A Pre-Trained Phrase Scoring Model [1.9275428660922076]
We present our method for tackling the legal case retrieval task of the Competition on Legal Information Extraction/Entailment 2019.
Our approach is based on the idea that summarization is important for retrieval.
We have achieved the state-of-the-art result for the task on the benchmark of the competition.
arXiv Detail & Related papers (2020-09-29T15:10:59Z) - Learning Neural Textual Representations for Citation Recommendation [7.227232362460348]
We propose a novel approach to citation recommendation using a deep sequential representation of the documents (Sentence-BERT) cascaded with Siamese and triplet networks in a submodular scoring function.
To the best of our knowledge, this is the first approach to combine deep representations and submodular selection for a task of citation recommendation.
arXiv Detail & Related papers (2020-07-08T12:38:50Z) - SPECTER: Document-level Representation Learning using Citation-informed
Transformers [51.048515757909215]
SPECTER generates document-level embedding of scientific documents based on pretraining a Transformer language model.
We introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction to document classification and recommendation.
arXiv Detail & Related papers (2020-04-15T16:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.