Legal Document Classification: An Application to Law Area Prediction of
Petitions to Public Prosecution Service
- URL: http://arxiv.org/abs/2010.12533v1
- Date: Tue, 13 Oct 2020 18:05:37 GMT
- Title: Legal Document Classification: An Application to Law Area Prediction of
Petitions to Public Prosecution Service
- Authors: Mariana Y. Noguti, Eduardo Vellasques, Luiz S. Oliveira
- Abstract summary: This paper proposes the use of NLP techniques for textual classification.
Our main goal is to automate the process of assigning petitions to their respective areas of law.
The best results were obtained with a combination of Word2Vec trained on a domain-specific corpus and a Recurrent Neural Network architecture.
- Score: 6.696983725360808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, there has been an increased interest in the application of
Natural Language Processing (NLP) to legal documents. The use of convolutional
and recurrent neural networks along with word embedding techniques have
presented promising results when applied to textual classification problems,
such as sentiment analysis and topic segmentation of documents. This paper
proposes the use of NLP techniques for textual classification, with the purpose
of categorizing the descriptions of the services provided by the Public
Prosecutor's Office of the State of Paran\'a to the population in one of the
areas of law covered by the institution. Our main goal is to automate the
process of assigning petitions to their respective areas of law, with a
consequent reduction in costs and time associated with such process while
allowing the allocation of human resources to more complex tasks. In this
paper, we compare different approaches to word representations in the
aforementioned task: including document-term matrices and a few different word
embeddings. With regards to the classification models, we evaluated three
different families: linear models, boosted trees and neural networks. The best
results were obtained with a combination of Word2Vec trained on a
domain-specific corpus and a Recurrent Neural Network (RNN) architecture (more
specifically, LSTM), leading to an accuracy of 90\% and F1-Score of 85\% in the
classification of eighteen categories (law areas).
Related papers
- Text Classification using Graph Convolutional Networks: A Comprehensive Survey [11.1080224302799]
Graph convolution network (GCN)-based approaches have gained a lot of traction in this domain over the last decade.
This work aims to summarize and categorize various GCN-based Text Classification approaches with regard to the architecture and mode of supervision.
arXiv Detail & Related papers (2024-10-12T07:03:42Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - Tuning Traditional Language Processing Approaches for Pashto Text
Classification [0.0]
The main aim of this study is to establish a Pashto automatic text classification system.
This study compares several models containing both statistical and neural network machine learning techniques.
This research obtained average testing accuracy rate 94% using classification algorithm and TFIDF feature extraction method.
arXiv Detail & Related papers (2023-05-04T22:57:45Z) - Like a Good Nearest Neighbor: Practical Content Moderation and Text
Classification [66.02091763340094]
Like a Good Nearest Neighbor (LaGoNN) is a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor.
LaGoNN is effective at flagging undesirable content and text classification, and improves the performance of SetFit.
arXiv Detail & Related papers (2023-02-17T15:43:29Z) - Attentive Deep Neural Networks for Legal Document Retrieval [2.4350217735794337]
We study the use of attentive neural network-based text representation for statute law document retrieval.
We develop two hierarchical architectures with sparse attention to represent long sentences and articles, and we name them Attentive CNN and Paraformer.
Experimental results show that Attentive neural methods substantially outperform non-neural methods in terms of retrieval performance across datasets and languages.
arXiv Detail & Related papers (2022-12-13T01:37:27Z) - Taxonomy Enrichment with Text and Graph Vector Representations [61.814256012166794]
We address the problem of taxonomy enrichment which aims at adding new words to the existing taxonomy.
We present a new method that allows achieving high results on this task with little effort.
We achieve state-of-the-art results across different datasets and provide an in-depth error analysis of mistakes.
arXiv Detail & Related papers (2022-01-21T09:01:12Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z) - Rank over Class: The Untapped Potential of Ranking in Natural Language
Processing [8.637110868126546]
We argue that many tasks which are currently addressed using classification are in fact being shoehorned into a classification mould.
We propose a novel end-to-end ranking approach consisting of a Transformer network responsible for producing representations for a pair of text sequences.
In an experiment on a heavily-skewed sentiment analysis dataset, converting ranking results to classification labels yields an approximately 22% improvement over state-of-the-art text classification.
arXiv Detail & Related papers (2020-09-10T22:18:57Z) - Evaluation of Neural Network Classification Systems on Document Stream [0.5068448669777386]
We analyse the efficiency of NN-based document classification systems in a sub-optimal training case.
The evaluation was divided into four parts: a reference case, to assess the performance of the system in the lab; two cases that each simulate a specific difficulty linked to document stream processing; and a realistic case that combined all of these difficulties.
arXiv Detail & Related papers (2020-07-15T08:52:39Z) - Text Classification with Few Examples using Controlled Generalization [58.971750512415134]
Current practice relies on pre-trained word embeddings to map words unseen in training to similar seen ones.
Our alternative begins with sparse pre-trained representations derived from unlabeled parsed corpora.
We show that a feed-forward network over these vectors is especially effective in low-data scenarios.
arXiv Detail & Related papers (2020-05-18T06:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.