A Survey of Text Representation Methods and Their Genealogy
- URL: http://arxiv.org/abs/2211.14591v1
- Date: Sat, 26 Nov 2022 15:22:01 GMT
- Title: A Survey of Text Representation Methods and Their Genealogy
- Authors: Philipp Siebers, Christian Janiesch, Patrick Zschech
- Abstract summary: In recent years, with the advent of highly scalable artificial-neural-network-based text representation methods the field of natural language processing has seen unprecedented growth and sophistication.
We provide a survey of current approaches, by arranging them in a genealogy, and by conceptualizing a taxonomy of text representation methods to examine and explain the state-of-the-art.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, with the advent of highly scalable
artificial-neural-network-based text representation methods the field of
natural language processing has seen unprecedented growth and sophistication.
It has become possible to distill complex linguistic information of text into
multidimensional dense numeric vectors with the use of the distributional
hypothesis. As a consequence, text representation methods have been evolving at
such a quick pace that the research community is struggling to retain knowledge
of the methods and their interrelations. We contribute threefold to this lack
of compilation, composition, and systematization by providing a survey of
current approaches, by arranging them in a genealogy, and by conceptualizing a
taxonomy of text representation methods to examine and explain the
state-of-the-art. Our research is a valuable guide and reference for artificial
intelligence researchers and practitioners interested in natural language
processing applications such as recommender systems, chatbots, and sentiment
analysis.
Related papers
- From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models [17.04716417556556]
This review visits foundational concepts such as the distributional hypothesis and contextual similarity.
We examine both static and contextualized embeddings, underscoring advancements in models such as ELMo, BERT, and GPT.
The discussion extends to sentence and document embeddings, covering aggregation methods and generative topic models.
Advanced topics such as model compression, interpretability, numerical encoding, and bias mitigation are analyzed, addressing both technical challenges and ethical implications.
arXiv Detail & Related papers (2024-11-06T15:40:02Z) - Comprehensive Study on Sentiment Analysis: From Rule-based to modern LLM based system [0.0]
This study examines the historical development of sentiment analysis, highlighting the transition from lexicon-based and pattern-based approaches to more sophisticated machine learning and deep learning models.
The paper reviews state-of-the-art approaches, identifies emerging trends, and outlines future research directions to advance the field.
arXiv Detail & Related papers (2024-09-16T04:44:52Z) - A Survey on Lexical Ambiguity Detection and Word Sense Disambiguation [0.0]
This paper explores techniques that focus on understanding and resolving ambiguity in language within the field of natural language processing (NLP)
It outlines diverse approaches ranging from deep learning techniques to leveraging lexical resources and knowledge graphs like WordNet.
The research identifies persistent challenges in the field, such as the scarcity of sense annotated corpora and the complexity of informal clinical texts.
arXiv Detail & Related papers (2024-03-24T12:58:48Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - Analysis of the Evolution of Advanced Transformer-Based Language Models:
Experiments on Opinion Mining [0.5735035463793008]
This paper studies the behaviour of the cutting-edge Transformer-based language models on opinion mining.
Our comparative study shows leads and paves the way for production engineers regarding the approach to focus on.
arXiv Detail & Related papers (2023-08-07T01:10:50Z) - Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image.
We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - Improve Discourse Dependency Parsing with Contextualized Representations [28.916249926065273]
We propose to take advantage of transformers to encode contextualized representations of units of different levels.
Motivated by the observation of writing patterns commonly shared across articles, we propose a novel method that treats discourse relation identification as a sequence labelling task.
arXiv Detail & Related papers (2022-05-04T14:35:38Z) - Faithfulness in Natural Language Generation: A Systematic Survey of
Analysis, Evaluation and Optimization Methods [48.47413103662829]
Natural Language Generation (NLG) has made great progress in recent years due to the development of deep learning techniques such as pre-trained language models.
However, the faithfulness problem that the generated text usually contains unfaithful or non-factual information has become the biggest challenge.
arXiv Detail & Related papers (2022-03-10T08:28:32Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - From Show to Tell: A Survey on Image Captioning [48.98681267347662]
Connecting Vision and Language plays an essential role in Generative Intelligence.
Research in image captioning has not reached a conclusive answer yet.
This work aims at providing a comprehensive overview and categorization of image captioning approaches.
arXiv Detail & Related papers (2021-07-14T18:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.