MultiLS: A Multi-task Lexical Simplification Framework
- URL: http://arxiv.org/abs/2402.14972v1
- Date: Thu, 22 Feb 2024 21:16:18 GMT
- Title: MultiLS: A Multi-task Lexical Simplification Framework
- Authors: Kai North, Tharindu Ranasinghe, Matthew Shardlow, Marcos Zampieri
- Abstract summary: We present MultiLS, the first LS framework that allows for the creation of a multi-task LS dataset.
We also present MultiLS-PT, the first dataset to be created using the MultiLS framework.
- Score: 21.81108113189197
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Lexical Simplification (LS) automatically replaces difficult to read words
for easier alternatives while preserving a sentence's original meaning. LS is a
precursor to Text Simplification with the aim of improving text accessibility
to various target demographics, including children, second language learners,
individuals with reading disabilities or low literacy. Several datasets exist
for LS. These LS datasets specialize on one or two sub-tasks within the LS
pipeline. However, as of this moment, no single LS dataset has been developed
that covers all LS sub-tasks. We present MultiLS, the first LS framework that
allows for the creation of a multi-task LS dataset. We also present MultiLS-PT,
the first dataset to be created using the MultiLS framework. We demonstrate the
potential of MultiLS-PT by carrying out all LS sub-tasks of (1). lexical
complexity prediction (LCP), (2). substitute generation, and (3). substitute
ranking for Portuguese. Model performances are reported, ranging from
transformer-based models to more recent large language models (LLMs).
Related papers
- Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - Linguistic Steganalysis via LLMs: Two Modes for Efficient Detection of Strongly Concealed Stego [6.99735992267331]
We design a novel LS with two modes called LSGC.
In the generation mode, we created an LS-task "description"
In the classification mode, LSGC deleted the LS-task "description" and used the "causalLM" LLMs to extract steganographic features.
arXiv Detail & Related papers (2024-06-06T16:18:02Z) - SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models [97.40590590880144]
We develop an extensive Multimodality Large Language Model (MLLM) series.
We assemble a comprehensive dataset covering publicly available resources in language, vision, and vision-language tasks.
We obtain a spectrum of MLLMs that vary in parameter size and multilingual capabilities.
arXiv Detail & Related papers (2024-02-08T18:59:48Z) - Multilingual Controllable Transformer-Based Lexical Simplification [4.718531520078843]
This paper proposes mTLS, a controllable Transformer-based Lexical Simplification (LS) system fined-tuned with the T5 model.
The novelty of this work lies in the use of language-specific prefixes, control tokens, and candidates extracted from pre-trained masked language models to learn simpler alternatives for complex words.
arXiv Detail & Related papers (2023-07-05T08:48:19Z) - Deep Learning Approaches to Lexical Simplification: A Survey [19.079916794185642]
Lexical Simplification (LS) is the task of replacing complex for simpler words in a sentence.
LS is the lexical component of Text Simplification (TS)
Recent advances in deep learning have sparked renewed interest in LS.
arXiv Detail & Related papers (2023-05-19T20:56:22Z) - Zero-Shot Cross-Lingual Summarization via Large Language Models [108.30673793281987]
Cross-lingual summarization ( CLS) generates a summary in a different target language.
Recent emergence of Large Language Models (LLMs) has attracted wide attention from the computational linguistics community.
In this report, we empirically use various prompts to guide LLMs to perform zero-shot CLS from different paradigms.
arXiv Detail & Related papers (2023-02-28T01:27:37Z) - Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language.
To collect large-scale CLS data, existing datasets typically involve translation in their creation.
In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z) - ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification [17.101023503289856]
ALEXSIS-PT is a novel multi-candidate dataset for Brazilian Portuguese LS containing 9,605 candidate substitutions for 387 complex words.
We evaluate four models for substitute generation on this dataset, namely mDistilBERT, mBERT, XLM-R, and BERTimbau.
arXiv Detail & Related papers (2022-09-19T14:10:21Z) - A Variational Hierarchical Model for Neural Cross-Lingual Summarization [85.44969140204026]
Cross-lingual summarization () is to convert a document in one language to a summary in another one.
Existing studies on CLS mainly focus on utilizing pipeline methods or jointly training an end-to-end model.
We propose a hierarchical model for the CLS task, based on the conditional variational auto-encoder.
arXiv Detail & Related papers (2022-03-08T02:46:11Z) - Depth-Adaptive Graph Recurrent Network for Text Classification [71.20237659479703]
Sentence-State LSTM (S-LSTM) is a powerful and high efficient graph recurrent network.
We propose a depth-adaptive mechanism for the S-LSTM, which allows the model to learn how many computational steps to conduct for different words as required.
arXiv Detail & Related papers (2020-02-29T03:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.