Sentence Embeddings and High-speed Similarity Search for Fast Computer
Assisted Annotation of Legal Documents
- URL: http://arxiv.org/abs/2112.11494v1
- Date: Tue, 21 Dec 2021 19:27:21 GMT
- Title: Sentence Embeddings and High-speed Similarity Search for Fast Computer
Assisted Annotation of Legal Documents
- Authors: Hannes Westermann, Jaromir Savelka, Vern R. Walker, Kevin D. Ashley,
Karim Benyekhlef
- Abstract summary: We introduce a proof-of-concept system for annotating sentences "laterally"
The approach is based on the observation that sentences that are similar in meaning often have the same label in terms of a particular type system.
- Score: 0.5249805590164901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-performed annotation of sentences in legal documents is an important
prerequisite to many machine learning based systems supporting legal tasks.
Typically, the annotation is done sequentially, sentence by sentence, which is
often time consuming and, hence, expensive. In this paper, we introduce a
proof-of-concept system for annotating sentences "laterally." The approach is
based on the observation that sentences that are similar in meaning often have
the same label in terms of a particular type system. We use this observation in
allowing annotators to quickly view and annotate sentences that are
semantically similar to a given sentence, across an entire corpus of documents.
Here, we present the interface of the system and empirically evaluate the
approach. The experiments show that lateral annotation has the potential to
make the annotation process quicker and more consistent.
Related papers
- Are manual annotations necessary for statutory interpretations retrieval? [41.94295877935867]
We try to determine the optimal number of annotations per a legal concept.<n>We also check if we can draw the sentences for annotation randomly or there is a gain in the performance of the model.
arXiv Detail & Related papers (2025-06-16T20:15:57Z) - Turn-taking annotation for quantitative and qualitative analyses of conversation [5.425050980601873]
Turn-taking was annotated on two layers, Inter-Pausal Units (IPU) and points of potential completion (PCOMP); similar to transition relevance places.
A detailed analysis of inter-rater agreement and common confusions shows that agreement for IPU annotation is near-perfect.
The system can be applied to a variety of conversational data for linguistic studies and technological applications.
arXiv Detail & Related papers (2025-04-14T08:45:04Z) - RankCSE: Unsupervised Sentence Representations Learning via Learning to
Rank [54.854714257687334]
We propose a novel approach, RankCSE, for unsupervised sentence representation learning.
It incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework.
An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks.
arXiv Detail & Related papers (2023-05-26T08:27:07Z) - Towards Unsupervised Recognition of Token-level Semantic Differences in
Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task.
We study three unsupervised approaches that rely on a masked language model.
Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z) - Improving Sentence Similarity Estimation for Unsupervised Extractive
Summarization [21.602394765472386]
We propose two novel strategies to improve sentence similarity estimation for unsupervised extractive summarization.
We use contrastive learning to optimize a document-level objective that sentences from the same document are more similar than those from different documents.
We also use mutual learning to enhance the relationship between sentence similarity estimation and sentence salience ranking.
arXiv Detail & Related papers (2023-02-24T07:10:33Z) - Same or Different? Diff-Vectors for Authorship Analysis [78.83284164605473]
In classic'' authorship analysis a feature vector represents a document, the value of a feature represents (an increasing function of) the relative frequency of the feature in the document, and the class label represents the author of the document.
Our experiments tackle same-author verification, authorship verification, and closed-set authorship attribution; while DVs are naturally geared for solving the 1st, we also provide two novel methods for solving the 2nd and 3rd.
arXiv Detail & Related papers (2023-01-24T08:48:12Z) - Court Judgement Labeling on HKLII [17.937279252256594]
HKLII has served as the repository of legal documents in Hong Kong for a decade.
Our team aims to incorporate NLP techniques into the website to make it more intelligent.
arXiv Detail & Related papers (2022-08-03T06:32:16Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Improving Document-Level Sentiment Classification Using Importance of
Sentences [3.007949058551534]
We propose a document-level sentence classification model based on deep neural networks.
We conduct experiments using the sentiment datasets in the four different domains such as movie reviews, hotel reviews, restaurant reviews, and music reviews.
The experimental results show that the importance of sentences should be considered in a document-level sentiment classification task.
arXiv Detail & Related papers (2021-03-09T01:29:08Z) - Dynamic Semantic Matching and Aggregation Network for Few-shot Intent
Detection [69.2370349274216]
Few-shot Intent Detection is challenging due to the scarcity of available annotated utterances.
Semantic components are distilled from utterances via multi-head self-attention.
Our method provides a comprehensive matching measure to enhance representations of both labeled and unlabeled instances.
arXiv Detail & Related papers (2020-10-06T05:16:38Z) - Paragraph-level Commonsense Transformers with Recurrent Memory [77.4133779538797]
We train a discourse-aware model that incorporates paragraph-level information to generate coherent commonsense inferences from narratives.
Our results show that PARA-COMET outperforms the sentence-level baselines, particularly in generating inferences that are both coherent and novel.
arXiv Detail & Related papers (2020-10-04T05:24:12Z) - Artemis: A Novel Annotation Methodology for Indicative Single Document
Summarization [27.55699431297619]
Artemis is a novel hierarchical annotation process that produces indicative summaries for documents from multiple domains.
It is more tractable because judges don't need to look at all the sentences in a document when making an importance judgment for one of the sentences.
We present analysis and experimental results over a sample set of 532 annotated documents.
arXiv Detail & Related papers (2020-05-05T13:38:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.