Indian Legal Text Summarization: A Text Normalisation-based Approach
- URL: http://arxiv.org/abs/2206.06238v1
- Date: Mon, 13 Jun 2022 15:16:50 GMT
- Title: Indian Legal Text Summarization: A Text Normalisation-based Approach
- Authors: Satyajit Ghosh, Mousumi Dutta, Tanaya Das
- Abstract summary: There are more than 4 crore cases outstanding in the Indian court system.
Many state-theart models for text summarization have emerged as machine learning has progressed.
domain-independent models don't do well with legal texts.
Authors have proposed a methodology for normalising legal texts in the Indian context.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the Indian court system, pending cases have long been a problem. There are
more than 4 crore cases outstanding. Manually summarising hundreds of documents
is a time-consuming and tedious task for legal stakeholders. Many
state-of-the-art models for text summarization have emerged as machine learning
has progressed. Domain-independent models don't do well with legal texts, and
fine-tuning those models for the Indian Legal System is problematic due to a
lack of publicly available datasets. To improve the performance of
domain-independent models, the authors have proposed a methodology for
normalising legal texts in the Indian context. The authors experimented with
two state-of-the-art domain-independent models for legal text summarization,
namely BART and PEGASUS. BART and PEGASUS are put through their paces in terms
of extractive and abstractive summarization to understand the effectiveness of
the text normalisation approach. Summarised texts are evaluated by domain
experts on multiple parameters and using ROUGE metrics. It shows the proposed
text normalisation approach is effective in legal texts with domain-independent
models.
Related papers
- IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning [16.12863746776168]
Legal systems worldwide are inundated with exponential growth in cases and documents.
There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents.
This paper proposes IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning.
arXiv Detail & Related papers (2024-07-07T14:55:04Z) - Improving Legal Judgement Prediction in Romanian with Long Text Encoders [0.8933959485129375]
We investigate specialized and general models for predicting the final ruling of a legal case, known as Legal Judgment Prediction (LJP)
In this work we focus on methods to extend to sequence length of Transformer-based models to better understand the long documents present in legal corpora.
arXiv Detail & Related papers (2024-02-29T13:52:33Z) - SLJP: Semantic Extraction based Legal Judgment Prediction [0.0]
Legal Judgment Prediction (LJP) is a judicial assistance system that recommends the legal components such as applicable statues, prison term and penalty term.
Most of the existing Indian models did not adequately concentrate on the semantics embedded in the fact description (FD) that impacts the decision.
The proposed semantic extraction based LJP (SLJP) model provides the advantages of pretrained transformers for complex unstructured legal case document understanding.
arXiv Detail & Related papers (2023-12-13T08:50:02Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - FlairNLP at SemEval-2023 Task 6b: Extraction of Legal Named Entities
from Legal Texts using Contextual String Embeddings [0.0]
We employ knowledge extraction techniques, specially the named entity extraction of legal entities within court case judgements.
We evaluate several state of the art architectures in the realm of sequence labeling using models trained on a curated dataset of legal texts.
A Bi-LSTM model trained on Flair Embeddings achieves the best results.
arXiv Detail & Related papers (2023-06-03T19:38:04Z) - Attentive Deep Neural Networks for Legal Document Retrieval [2.4350217735794337]
We study the use of attentive neural network-based text representation for statute law document retrieval.
We develop two hierarchical architectures with sparse attention to represent long sentences and articles, and we name them Attentive CNN and Paraformer.
Experimental results show that Attentive neural methods substantially outperform non-neural methods in terms of retrieval performance across datasets and languages.
arXiv Detail & Related papers (2022-12-13T01:37:27Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - Fine-tuning GPT-3 for Russian Text Summarization [77.34726150561087]
This paper showcases ruGPT3 ability to summarize texts, fine-tuning it on the corpora of Russian news with their corresponding human-generated summaries.
We evaluate the resulting texts with a set of metrics, showing that our solution can surpass the state-of-the-art model's performance without additional changes in architecture or loss function.
arXiv Detail & Related papers (2021-08-07T19:01:40Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene
Text Detection [147.10751375922035]
We propose the ContourNet, which effectively handles false positives and large scale variance of scene texts.
Our method effectively suppresses these false positives by only outputting predictions with high response value in both directions.
arXiv Detail & Related papers (2020-04-10T08:15:23Z) - Towards Making the Most of Context in Neural Machine Translation [112.9845226123306]
We argue that previous research did not make a clear use of the global context.
We propose a new document-level NMT framework that deliberately models the local context of each sentence.
arXiv Detail & Related papers (2020-02-19T03:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.