Unsupervised Simplification of Legal Texts
- URL: http://arxiv.org/abs/2209.00557v1
- Date: Thu, 1 Sep 2022 15:58:12 GMT
- Title: Unsupervised Simplification of Legal Texts
- Authors: Mert Cemri, Tolga \c{C}ukur, Aykut Ko\c{c}
- Abstract summary: We introduce an unsupervised simplification method for legal texts (USLT)
USLT performs domain-specific TS by replacing complex words and splitting long sentences.
We demonstrate that USLT outperforms state-of-the-art domain-general TS methods in text simplicity while keeping the semantics intact.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The processing of legal texts has been developing as an emerging field in
natural language processing (NLP). Legal texts contain unique jargon and
complex linguistic attributes in vocabulary, semantics, syntax, and morphology.
Therefore, the development of text simplification (TS) methods specific to the
legal domain is of paramount importance for facilitating comprehension of legal
text by ordinary people and providing inputs to high-level models for
mainstream legal NLP applications. While a recent study proposed a rule-based
TS method for legal text, learning-based TS in the legal domain has not been
considered previously. Here we introduce an unsupervised simplification method
for legal texts (USLT). USLT performs domain-specific TS by replacing complex
words and splitting long sentences. To this end, USLT detects complex words in
a sentence, generates candidates via a masked-transformer model, and selects a
candidate for substitution based on a rank score. Afterward, USLT recursively
decomposes long sentences into a hierarchy of shorter core and context
sentences while preserving semantic meaning. We demonstrate that USLT
outperforms state-of-the-art domain-general TS methods in text simplicity while
keeping the semantics intact.
Related papers
- A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models.
We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z) - Sequential Visual and Semantic Consistency for Semi-supervised Text
Recognition [56.968108142307976]
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training.
Most existing STR methods resort to synthetic data, which may introduce domain discrepancy and degrade the performance of STR models.
This paper proposes a novel semi-supervised learning method for STR that incorporates word-level consistency regularization from both visual and semantic aspects.
arXiv Detail & Related papers (2024-02-24T13:00:54Z) - Enhanced Simultaneous Machine Translation with Word-level Policies [2.12121796606941]
This paper demonstrates that policies devised at the subword level are surpassed by those operating at the word level.
We suggest a method to boost SiMT models using language models (LMs), wherein the proposed word-level policy plays a vital role in addressing the subword disparity between LMs and SiMT models.
arXiv Detail & Related papers (2023-10-25T07:10:42Z) - Discourse-Aware Text Simplification: From Complex Sentences to Linked
Propositions [11.335080241393191]
Text Simplification (TS) aims to modify sentences in order to make them easier to process.
We present a discourse-aware TS approach that splits and rephrases complex English sentences.
We generate a semantic hierarchy of minimal propositions that puts a semantic layer on top of the simplified sentences.
arXiv Detail & Related papers (2023-08-01T10:10:59Z) - LegalRelectra: Mixed-domain Language Modeling for Long-range Legal Text
Comprehension [6.442209435258797]
LegalRelectra is a legal-domain language model trained on mixed-domain legal and medical corpora.
Our training architecture implements the Electra framework, but utilizes Reformer instead of BERT for its generator and discriminator.
arXiv Detail & Related papers (2022-12-16T00:15:14Z) - Hierarchical Context Tagging for Utterance Rewriting [51.251400047377324]
Methods that tag rather than linearly generate sequences have proven stronger in both in- and out-of-domain rewriting settings.
We propose a hierarchical context tagger that mitigates this issue by predicting slotted rules.
Experiments on several benchmarks show that HCT can outperform state-of-the-art rewriting systems by 2 BLEU points.
arXiv Detail & Related papers (2022-06-22T17:09:34Z) - Unsupervised Sentence Simplification via Dependency Parsing [4.337513096197002]
We propose a simple yet novel unsupervised sentence simplification system.
It harnesses parsing structures together with sentence embeddings to produce linguistically effective simplifications.
We establish the unsupervised state-of-the-art at 39.13 SARI on TurkCorpus set and perform competitively against supervised baselines on various quality metrics.
arXiv Detail & Related papers (2022-06-10T07:55:25Z) - Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems.
We present an iterative in-place editing approach for text revision, which requires no parallel data.
It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - A Survey on Text Simplification [0.0]
Text Simplification (TS) aims to reduce the linguistic complexity of content to make it easier to understand.
This survey seeks to provide a comprehensive overview of TS, including a brief description of earlier approaches used.
arXiv Detail & Related papers (2020-08-19T18:12:33Z) - Towards Accurate Scene Text Recognition with Semantic Reasoning Networks [52.86058031919856]
We propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition.
GSRM is introduced to capture global semantic context through multi-way parallel transmission.
Results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method.
arXiv Detail & Related papers (2020-03-27T09:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.