From Lengthy to Lucid: A Systematic Literature Review on NLP Techniques
for Taming Long Sentences
- URL: http://arxiv.org/abs/2312.05172v1
- Date: Fri, 8 Dec 2023 16:51:29 GMT
- Title: From Lengthy to Lucid: A Systematic Literature Review on NLP Techniques
for Taming Long Sentences
- Authors: Tatiana Passali, Efstathios Chatzikyriakidis, Stelios Andreadis,
Thanos G. Stavropoulos, Anastasia Matonaki, Anestis Fachantidis, Grigorios
Tsoumakas
- Abstract summary: Long sentences have been a persistent issue in written communication for many years.
This survey systematically reviews two main strategies for addressing the issue of long sentences.
We categorize and group the most representative methods into a comprehensive taxonomy.
- Score: 3.4961473050660303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long sentences have been a persistent issue in written communication for many
years since they make it challenging for readers to grasp the main points or
follow the initial intention of the writer. This survey, conducted using the
PRISMA guidelines, systematically reviews two main strategies for addressing
the issue of long sentences: a) sentence compression and b) sentence splitting.
An increased trend of interest in this area has been observed since 2005, with
significant growth after 2017. Current research is dominated by supervised
approaches for both sentence compression and splitting. Yet, there is a
considerable gap in weakly and self-supervised techniques, suggesting an
opportunity for further research, especially in domains with limited data. In
this survey, we categorize and group the most representative methods into a
comprehensive taxonomy. We also conduct a comparative evaluation analysis of
these methods on common sentence compression and splitting datasets. Finally,
we discuss the challenges and limitations of current methods, providing
valuable insights for future research directions. This survey is meant to serve
as a comprehensive resource for addressing the complexities of long sentences.
We aim to enable researchers to make further advancements in the field until
long sentences are no longer a barrier to effective communication.
Related papers
- The What, Why, and How of Context Length Extension Techniques in Large
Language Models -- A Detailed Survey [6.516561905186376]
The advent of Large Language Models (LLMs) represents a notable breakthrough in Natural Language Processing (NLP)
We study the inherent challenges associated with extending context length and present an organized overview of the existing strategies employed by researchers.
We explore whether there is a consensus within the research community regarding evaluation standards and identify areas where further agreement is needed.
arXiv Detail & Related papers (2024-01-15T18:07:21Z) - Towards Better Chain-of-Thought Prompting Strategies: A Survey [60.75420407216108]
Chain-of-Thought (CoT) shows its impressive strength when used as a prompting strategy for large language models (LLM)
Recent years, the prominent effect of CoT prompting has attracted emerging research.
This survey could provide an overall reference on related research.
arXiv Detail & Related papers (2023-10-08T01:16:55Z) - A Comprehensive Survey of Sentence Representations: From the BERT Epoch
to the ChatGPT Era and Beyond [45.455178613559006]
Sentence representations are a critical component in NLP applications such as retrieval, question answering, and text classification.
They capture the meaning of a sentence, enabling machines to understand and reason over human language.
There is no literature review on sentence representations till now.
arXiv Detail & Related papers (2023-05-22T02:31:15Z) - Full-Text Argumentation Mining on Scientific Publications [3.8754200816873787]
We introduce a sequential pipeline model combining ADUR and ARE for full-text SAM.
We provide a first analysis of the performance of pretrained language models (PLMs) on both subtasks.
Our detailed error analysis reveals that non-contiguous ADUs as well as the interpretation of discourse connectors pose major challenges.
arXiv Detail & Related papers (2022-10-24T10:05:30Z) - A Character-Level Length-Control Algorithm for Non-Autoregressive
Sentence Summarization [23.495225374478295]
Sentence summarization aims at compressing a long sentence into a short one that keeps the main gist, and has extensive real-world applications such as headline generation.
In our work, we address a new problem of explicit character-level length control for summarization, and propose a dynamic programming algorithm based on the Connectionist Temporal Classification (CTC) model.
arXiv Detail & Related papers (2022-05-28T21:09:53Z) - Predicting Above-Sentence Discourse Structure using Distant Supervision
from Topic Segmentation [8.688675709130289]
RST-style discourse parsing plays a vital role in many NLP tasks.
Despite its importance, one of the most prevailing limitations in modern day discourse parsing is the lack of large-scale datasets.
arXiv Detail & Related papers (2021-12-12T10:16:45Z) - Clustering and Network Analysis for the Embedding Spaces of Sentences
and Sub-Sentences [69.3939291118954]
This paper reports research on a set of comprehensive clustering and network analyses targeting sentence and sub-sentence embedding spaces.
Results show that one method generates the most clusterable embeddings.
In general, the embeddings of span sub-sentences have better clustering properties than the original sentences.
arXiv Detail & Related papers (2021-10-02T00:47:35Z) - Phrase Retrieval Learns Passage Retrieval, Too [77.57208968326422]
We study whether phrase retrieval can serve as the basis for coarse-level retrieval including passages and documents.
We show that a dense phrase-retrieval system, without any retraining, already achieves better passage retrieval accuracy.
We also show that phrase filtering and vector quantization can reduce the size of our index by 4-10x.
arXiv Detail & Related papers (2021-09-16T17:42:45Z) - A Survey of Unsupervised Dependency Parsing [62.16714720135358]
Unsupervised dependency parsing aims to learn a dependency from sentences that have no annotation of their correct parse trees.
Despite its difficulty, unsupervised parsing is an interesting research direction because of its capability of utilizing almost unlimited unannotated text data.
arXiv Detail & Related papers (2020-10-04T10:51:22Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.