Related papers: Automatic Detection of Industry Sectors in Legal Articles Using Machine Learning Approaches

Automatic Detection of Industry Sectors in Legal Articles Using Machine Learning Approaches

URL: http://arxiv.org/abs/2303.05387v1
Date: Wed, 8 Mar 2023 12:41:56 GMT
Title: Automatic Detection of Industry Sectors in Legal Articles Using Machine Learning Approaches
Authors: Hui Yang (1 and 2), Stella Hadjiantoni (1), Yunfei Long (3), Ruta Petraityte (2), Berthold Lausen (1 and 4) ((1) Department of Mathematical Sciences, University of Essex, Wivenhoe Park, Colchester, CO43SQ, UK, (2) Mondaq Ltd, Bristol, UK, (3) School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester, CO43SQ, UK, (4) Institute of Medical Informatics, Biometry and Epidemiology, School of Medicine, Friedrich-Alexander University Erlangen-Nuremberg, Waldstr. 6, Erlangen, 91054, Germany)
Abstract summary: A dataset consisting of over 1,700 annotated legal articles was created for the identification of six industry sectors. The system achieved promising results with area under the receiver operating characteristic curve scores above 0.90 and F-scores above 0.81 with respect to the six industry sectors.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The ability to automatically identify industry sector coverage in articles on legal developments, or any kind of news articles for that matter, can bring plentiful of benefits both to the readers and the content creators themselves. By having articles tagged based on industry coverage, readers from all around the world would be able to get to legal news that are specific to their region and professional industry. Simultaneously, writers would benefit from understanding which industries potentially lack coverage or which industries readers are currently mostly interested in and thus, they would focus their writing efforts towards more inclusive and relevant legal news coverage. In this paper, a Machine Learning-powered industry analysis approach which combined Natural Language Processing (NLP) with Statistical and Machine Learning (ML) techniques was investigated. A dataset consisting of over 1,700 annotated legal articles was created for the identification of six industry sectors. Text and legal based features were extracted from the text. Both traditional ML methods (e.g. gradient boosting machine algorithms, and decision-tree based algorithms) and deep neural network (e.g. transformer models) were applied for performance comparison of predictive models. The system achieved promising results with area under the receiver operating characteristic curve scores above 0.90 and F-scores above 0.81 with respect to the six industry sectors. The experimental results show that the suggested automated industry analysis which employs ML techniques allows the processing of large collections of text data in an easy, efficient, and scalable way. Traditional ML methods perform better than deep neural networks when only a small and domain-specific training data is available for the study.

Related papers

Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection [71.59834293521074]
We develop a framework to distinguish between human-authored and machine-generated text.<n>Our method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset.<n>Code, pretrained weights, and demo will be released.
arXiv Detail & Related papers (2025-10-07T08:14:45Z)
Exploratory Semantic Reliability Analysis of Wind Turbine Maintenance Logs using Large Language Models [0.0]
This paper addresses the gap in leveraging modern large language models (LLMs) for more complex reasoning tasks.<n>We introduce an exploratory framework that uses LLMs to move beyond classification and perform semantic analysis.<n>The results demonstrate that LLMs can function as powerful "reliability co-pilots," moving beyond labelling to synthesise textual information and actionable, expert-level hypotheses.
arXiv Detail & Related papers (2025-09-26T14:00:20Z)
Robust Detection of LLM-Generated Text: A Comparative Analysis [0.276240219662896]
Large language models can be widely integrated into many aspects of life, and their output can quickly fill all network resources. It becomes increasingly important to develop powerful detectors for the generated text. This detector is essential to prevent the potential misuse of these technologies and to protect areas such as social media from the negative effects.
arXiv Detail & Related papers (2024-11-09T18:27:15Z)
A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document. Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative. Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z)
Uncovering Key Trends in Industry 5.0 through Advanced AI Techniques [0.0]
This article analyzes around 200 online articles to identify trends within Industry 5.0 using artificial intelligence techniques. The results reveal a convergence around a core set of themes while also highlighting that Industry 5.0 spans a wide range of topics.
arXiv Detail & Related papers (2024-10-22T07:06:00Z)
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection [87.43727192273772]
It is often hard to tell whether a piece of text was human-written or machine-generated. We present LLM-DetectAIve, designed for fine-grained detection. It supports four categories: (i) human-written, (ii) machine-generated, (iii) machine-written, then machine-humanized, and (iv) human-written, then machine-polished.
arXiv Detail & Related papers (2024-08-08T07:43:17Z)
Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection [71.93411099797308]
Out-of-distribution (OOD) samples are crucial when deploying machine learning models in open-world scenarios. We propose to tackle this constraint by leveraging the expert knowledge and reasoning capability of large language models (LLM) to potential Outlier Exposure, termed EOE. EOE can be generalized to different tasks, including far, near, and fine-language OOD detection. EOE achieves state-of-the-art performance across different OOD tasks and can be effectively scaled to the ImageNet-1K dataset.
arXiv Detail & Related papers (2024-06-02T17:09:48Z)
Automatic explanation of the classification of Spanish legal judgments in jurisdiction-dependent law categories with tree estimators [6.354358255072839]
This work contributes with a system combining Natural Language Processing (NLP) with Machine Learning (ML) to classify legal texts in an explainable manner. We analyze the features involved in the decision and the threshold bifurcation values of the decision paths of tree structures. Legal experts have validated our solution, and this knowledge has also been incorporated into the explanation process as "expert-in-the-loop" dictionaries.
arXiv Detail & Related papers (2024-03-30T17:59:43Z)
Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z)
Application of Transformers based methods in Electronic Medical Records: A Systematic Literature Review [77.34726150561087]
This work presents a systematic literature review of state-of-the-art advances using transformer-based methods on electronic medical records (EMRs) in different NLP tasks.
arXiv Detail & Related papers (2023-04-05T22:19:42Z)
RaFoLa: A Rationale-Annotated Corpus for Detecting Indicators of Forced Labour [4.393754160527062]
This paper presents the first openly accessible English corpus annotated for multi-class and multi-label forced labour detection. The corpus consists of 989 news articles retrieved from specialised data sources and annotated according to risk indicators defined by the International Labour Organization (ILO)
arXiv Detail & Related papers (2022-05-05T14:43:31Z)
Rebuilding Trust in Active Learning with Actionable Metrics [77.99796068970569]
Active Learning (AL) is an active domain of research, but is seldom used in the industry despite the pressing needs. This is in part due to a misalignment of objectives, while research strives at getting the best results on selected datasets. We present various actionable metrics to help rebuild trust of industrial practitioners in Active Learning.
arXiv Detail & Related papers (2020-12-18T09:34:59Z)
Supervised Text Classification using Text Search [0.0]
Authors describe a class of industrial standard algorithms which can accurately predict classification of any text given prior labelled text data. These algorithms were used to automate routing of issue tickets to the appropriate team.
arXiv Detail & Related papers (2020-11-14T19:51:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.