Related papers: An Inclusive Notion of Text

An Inclusive Notion of Text

URL: http://arxiv.org/abs/2211.05604v2
Date: Wed, 17 May 2023 09:56:05 GMT
Title: An Inclusive Notion of Text
Authors: Ilia Kuznetsov, Iryna Gurevych
Abstract summary: We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
Score: 69.36678873492373
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Natural language processing (NLP) researchers develop models of grammar, meaning and communication based on written text. Due to task and data differences, what is considered text can vary substantially across studies. A conceptual framework for systematically capturing these differences is lacking. We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. Towards that goal, we propose common terminology to discuss the production and transformation of textual data, and introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling. We apply this taxonomy to survey existing work that extends the notion of text beyond the conservative language-centered view. We outline key desiderata and challenges of the emerging inclusive approach to text in NLP, and suggest community-level reporting as a crucial next step to consolidate the discussion.

Related papers

Autoencoder-Based Framework to Capture Vocabulary Quality in NLP [2.41710192205034]
We introduce an autoencoder-based framework that uses neural network capacity as a proxy for vocabulary richness, diversity, and complexity. We validate our approach on two distinct datasets: the DIFrauD dataset, which spans multiple domains of deceptive and fraudulent text, and the Project Gutenberg dataset, representing diverse languages, genres, and historical periods.
arXiv Detail & Related papers (2025-02-28T21:45:28Z)
The Text Classification Pipeline: Starting Shallow going Deeper [4.97309503788908]
The past decade has seen deep learning revolutionize text classification. English is the predominant language of focus, despite studies involving Arabic, Chinese, Hindi, and others. This work integrates traditional and contemporary text mining methodologies, fostering a holistic understanding of text classification.
arXiv Detail & Related papers (2024-12-30T23:01:19Z)
From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models [17.04716417556556]
This review visits foundational concepts such as the distributional hypothesis and contextual similarity. We examine both static and contextualized embeddings, underscoring advancements in models such as ELMo, BERT, and GPT. The discussion extends to sentence and document embeddings, covering aggregation methods and generative topic models. Advanced topics such as model compression, interpretability, numerical encoding, and bias mitigation are analyzed, addressing both technical challenges and ethical implications.
arXiv Detail & Related papers (2024-11-06T15:40:02Z)
Automatic and Human-AI Interactive Text Generation [27.05024520190722]
This tutorial aims to provide an overview of the state-of-the-art natural language generation research. Text-to-text generation tasks are more constrained in terms of semantic consistency and targeted language styles.
arXiv Detail & Related papers (2023-10-05T20:26:15Z)
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information [68.89000132126536]
This work proposes to use inter-utterance linguistic information to improve the performance of prosodic structure prediction (PSP) Our method achieves better F1 scores in predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase (IPH)
arXiv Detail & Related papers (2023-08-31T09:19:15Z)
Natural Language Decompositions of Implicit Content Enable Better Text Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed. Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z)
Revise and Resubmit: An Intertextual Model of Text-based Collaboration in Peer Review [52.359007622096684]
Peer review is a key component of the publishing process in most fields of science. Existing NLP studies focus on the analysis of individual texts. editorial assistance often requires modeling interactions between pairs of texts.
arXiv Detail & Related papers (2022-04-22T16:39:38Z)
Positioning yourself in the maze of Neural Text Generation: A Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc. We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP) In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.